The six month progress report summarizes what the author has learned, gaps in their understanding, outputs so far, and plans until December 2008. The author provides details on settling into a new environment and learning about regulations, as well as gaining knowledge in thesis writing, LaTeX, databases, and biological text mining theory and hands-on work. The report also outlines continuing to develop skills in writing, coding, and presenting results.
This presentation was provided by Dr. Jane Greenberg of The University of North Carolina at Chapel Hill, Joan Starr and John Kunze of The California Digital Library, and Joel Hammond of Thomson Reuters, during the NISO webinar "Show Me the Data: Managing Data Sets for Scholarly Content" held on August 11, 2010.
This presentation was provided by Dr. Jane Greenberg of The University of North Carolina at Chapel Hill, Joan Starr and John Kunze of The California Digital Library, and Joel Hammond of Thomson Reuters, during the NISO webinar "Show Me the Data: Managing Data Sets for Scholarly Content" held on August 11, 2010.
!"#$%&'()*&#+,$)
!"#$$%&'%()*)%
• +,-%./$0$1%2,#""-34-$%#35%'676.-%&'%8/2.&'/3#32-9:#20-5%$-26./7/-$%#$%#%
$-;#.#7-%#$$-7%2"#$$%
• <#"#32-%$,--7%$6;;&.7%$7.6276.-$%'&.%'/3#32/#"%/3$7/767/&3$%
• =>#8/3/34%7,-%$&92#""-5%[email protected]$$%+-$7A%#35%7,-%2&8;-7/7/B-%/8;"/2#7/&3$%
• C#7/34%D-54-%E635$%
• +,-%F<@%!#$-G%!./$/$%#35%7,-%H#I%'&.H#.5%
• D-54-%E635$%J #%7-$7%2#$-%'&.%'676.-%.-46"#7/&3%#35%./$0%4&B-.3#32-%&'%$,#5&H%
:#30/34%#27/B/7/-$%%
!"#$$%&'%()**%
• K;7/8#"%E635/34%#35%L3B-$78-37%@7.#7-4I%'&.%#3%L37-.3#7/&3#"%M&.7'&"/&%&'%
!&88&5/7I%N-./B#7/B-$%#35%<&35$O!#$-%@765I%&'%=NM%C-3-H#:"-$%
• !&.;&.#7-%C/$0%P#3#4-8-37O=>;-./-32-$%N6./34%())Q%!./$/$%%
• C/$0%P#3#4-8-37%'&.%/3B-$7/34%/3%=8-.4/34%P#.0-7$%L32"65/34%D-54/34%
@7.#7-4/-$%
• M&H-.%@6.4-9R$$-79:#20-5%=3-.4I%+.#5/34%C/$0%#35%C-76.3%
• R$$-$$/34%#35%D-54/34%K;-.#7/&3#"%C/$09D-54/34%@I$7-8%N&H37/8-%
!"#$$%&'%()*(%
• R%E.#8-H&.0%'&.%K;-.#7/&3#"%C/$0%P/7/4#7/&3%
• @I$7-8/2%C/$0%@#'-46#.5$%'&.%!-37.#"%!"-#./34%!&637-.;#.7/-$%
• P#3#4-8-37%&'%C-;67#7/&3#"%C/$0G%R%M.#27/2#"%R;;.,%
• @7.-$$%+-$7/34%#35%C/$0%!&37.&"$G%!,#34-$%'.&8%())S%7&%()*)%
• @7.#7-4/2%P&.74#4-%N-'#6"7%/3%E/3#32/#"%L3$7/767/&3$%%
• =8-.4/34%T/U6/5/7I%C-U6/.-8-37$%#35%7,-%L8;#27%&3%<#30/34%#35%L3$6.#32-%
• L37.&562/34%=CPG%R%!#$-%@765I%&'%#%@,/;;/34%!&8;#3I%
• L8;#27%&'%C-46"#7/&3%&3%C/$0%%
%
%
!"#$$%&'%()*V%
• +,-%=''-27/B-3-$$%&'%7,-%C-46"#7&.I%@7.-$$%+-$7/34%N/$2"&$6.-%M.&2-$$%
• <.#W/"G%L$%+,/$%+/8-%N/''-.-37X%
• N&-$%C-;67#7/&3%&'%@LEL$%.-#""I%8#77-.X%
• @7.#7-4/2%D-54/34%!,&/2-$%/3%Y,&"-$#"-%#35%C-7#/"%<#30/34%
• P#3#4/34%!6..-32I%C/$0%/3%@8#""%=8-.4/34%P#.0-7$G%R%!#$-%@765I%/3%!&$7#%C/2#%
• @,#5&H%<#30/34G%!#$-%@765/-$%#35%Z"&:#"%C-46"#7&.I%C-'&.8%
• N&-$%C/$0%P#3#4-8-37%P#77-.%7&%@,#.-,&"5-.$X%
• +,-%L8;#27%&'%=>2,#34-%+.#5-5%M.&5627$%&3%7,-%M#.#8-7-.$%&'%Z&"5%
TIMELINE
Due Date Deliverable Comments
July 15, 2013 Individual topic proposal 1-2 paragraph write-up. A discussion forum is created
to share ideas
July 29, 2013 Pitching Session 5-6 minutes presentation to the class
August 3, 2013
Group formation and decision about the
topic
One page proposal stating the objectives and proposed
methodology and a list of team members.
August 26, 2013 An adviser is assigned Conference call/meeting with the advisor
October 2013 Status report Short report stating the work done so far and the
steps planned for coming months.
April 21, 2014 Draft report It should be a complete report in all respects. Any
changes at this point should be limited to non-critical
components.
May 5, 2014 Final report The structure if the report must include an 'Executive
Summary' followed by a detailed report, followed by
Appendices. The report should be as long as it needs to
be and not longer.
May 20, 2014 Presentation 20 minutes presentation followed by 20 minute Q&A
Master of Science in Risk
Management Program
Strategic Capstone Workshop
May 30, 2013
Agenda
•! Introductio ...
Paul Henning Krogh A New Dawn For E Collaboration In ScienceVincenzo Barone
Plone has growing reputation within research for working as an important component in international scientific collaboration infrastructures. In this panel session researchers shall present and answer questions on both their experiences in using Plone in a scientific context and on their research of studying Plone in use by scientists. Attendees will leave with a better conception of what is needed for international scientific collaboration and what Plone can offer as an e-collaboration tool to support research infrastructures. The panel participants will bring in expertise on computer supported collaborative work (CSCW) to stimulate use and development of Plone applications for such use cases. Panel headlines: - Exchange experiences with Plone in research environments (use cases) - Requirements for Plone in research environments: what's available, which extensions or modifications do we need? - Coordinate actions around Plone products for scientific use - Promote the use of Plone in scientific environments - Confront conceptions of collaborative research processes with Plone implementations of such models
Semantic Web research anno 2006:main streams, popular falacies, current statu...Frank van Harmelen
This keynote at the Cooperative Intelligent Agents Workshop was a good opportunity to give my view on the current state of Semantic Web research: what is it about, what is it not about, what has been achieved, what remains to be done. (Includes the now infamous slide "What's it like to be a machine")
Common Qualitative Research Designs and What They’re Good ForStatistics Solutions
Thinking of conducting a qualitative study? In this presentation, you will learn more about common qualitative research designs. Included is a discussion of the applications of these designs and how they can address a variety of qualitative research questions.
Research data management for medical data with pyradigm.
Python data structure for biomedical data to manage multiple tables linked via patient info or other washable IDs. Allowing continuous validation, this data structure would improve ease of use as well as integrity of the dataset.
!"#$%&'()*&#+,$)
!"#$$%&'%()*)%
• +,-%./$0$1%2,#""-34-$%#35%'676.-%&'%8/2.&'/3#32-9:#20-5%$-26./7/-$%#$%#%
$-;#.#7-%#$$-7%2"#$$%
• <#"#32-%$,--7%$6;;&.7%$7.6276.-$%'&.%'/3#32/#"%/3$7/767/&3$%
• =>#8/3/34%7,-%$&92#""-5%[email protected]$$%+-$7A%#35%7,-%2&8;-7/7/B-%/8;"/2#7/&3$%
• C#7/34%D-54-%E635$%
• +,-%F<@%!#$-G%!./$/$%#35%7,-%H#I%'&.H#.5%
• D-54-%E635$%J #%7-$7%2#$-%'&.%'676.-%.-46"#7/&3%#35%./$0%4&B-.3#32-%&'%$,#5&H%
:#30/34%#27/B/7/-$%%
!"#$$%&'%()**%
• K;7/8#"%E635/34%#35%L3B-$78-37%@7.#7-4I%'&.%#3%L37-.3#7/&3#"%M&.7'&"/&%&'%
!&88&5/7I%N-./B#7/B-$%#35%<&35$O!#$-%@765I%&'%=NM%C-3-H#:"-$%
• !&.;&.#7-%C/$0%P#3#4-8-37O=>;-./-32-$%N6./34%())Q%!./$/$%%
• C/$0%P#3#4-8-37%'&.%/3B-$7/34%/3%=8-.4/34%P#.0-7$%L32"65/34%D-54/34%
@7.#7-4/-$%
• M&H-.%@6.4-9R$$-79:#20-5%=3-.4I%+.#5/34%C/$0%#35%C-76.3%
• R$$-$$/34%#35%D-54/34%K;-.#7/&3#"%C/$09D-54/34%@I$7-8%N&H37/8-%
!"#$$%&'%()*(%
• R%E.#8-H&.0%'&.%K;-.#7/&3#"%C/$0%P/7/4#7/&3%
• @I$7-8/2%C/$0%@#'-46#.5$%'&.%!-37.#"%!"-#./34%!&637-.;#.7/-$%
• P#3#4-8-37%&'%C-;67#7/&3#"%C/$0G%R%M.#27/2#"%R;;.,%
• @7.-$$%+-$7/34%#35%C/$0%!&37.&"$G%!,#34-$%'.&8%())S%7&%()*)%
• @7.#7-4/2%P&.74#4-%N-'#6"7%/3%E/3#32/#"%L3$7/767/&3$%%
• =8-.4/34%T/U6/5/7I%C-U6/.-8-37$%#35%7,-%L8;#27%&3%<#30/34%#35%L3$6.#32-%
• L37.&562/34%=CPG%R%!#$-%@765I%&'%#%@,/;;/34%!&8;#3I%
• L8;#27%&'%C-46"#7/&3%&3%C/$0%%
%
%
!"#$$%&'%()*V%
• +,-%=''-27/B-3-$$%&'%7,-%C-46"#7&.I%@7.-$$%+-$7/34%N/$2"&$6.-%M.&2-$$%
• <.#W/"G%L$%+,/$%+/8-%N/''-.-37X%
• N&-$%C-;67#7/&3%&'%@LEL$%.-#""I%8#77-.X%
• @7.#7-4/2%D-54/34%!,&/2-$%/3%Y,&"-$#"-%#35%C-7#/"%<#30/34%
• P#3#4/34%!6..-32I%C/$0%/3%@8#""%=8-.4/34%P#.0-7$G%R%!#$-%@765I%/3%!&$7#%C/2#%
• @,#5&H%<#30/34G%!#$-%@765/-$%#35%Z"&:#"%C-46"#7&.I%C-'&.8%
• N&-$%C/$0%P#3#4-8-37%P#77-.%7&%@,#.-,&"5-.$X%
• +,-%L8;#27%&'%=>2,#34-%+.#5-5%M.&5627$%&3%7,-%M#.#8-7-.$%&'%Z&"5%
TIMELINE
Due Date Deliverable Comments
July 15, 2013 Individual topic proposal 1-2 paragraph write-up. A discussion forum is created
to share ideas
July 29, 2013 Pitching Session 5-6 minutes presentation to the class
August 3, 2013
Group formation and decision about the
topic
One page proposal stating the objectives and proposed
methodology and a list of team members.
August 26, 2013 An adviser is assigned Conference call/meeting with the advisor
October 2013 Status report Short report stating the work done so far and the
steps planned for coming months.
April 21, 2014 Draft report It should be a complete report in all respects. Any
changes at this point should be limited to non-critical
components.
May 5, 2014 Final report The structure if the report must include an 'Executive
Summary' followed by a detailed report, followed by
Appendices. The report should be as long as it needs to
be and not longer.
May 20, 2014 Presentation 20 minutes presentation followed by 20 minute Q&A
Master of Science in Risk
Management Program
Strategic Capstone Workshop
May 30, 2013
Agenda
•! Introductio ...
Paul Henning Krogh A New Dawn For E Collaboration In ScienceVincenzo Barone
Plone has growing reputation within research for working as an important component in international scientific collaboration infrastructures. In this panel session researchers shall present and answer questions on both their experiences in using Plone in a scientific context and on their research of studying Plone in use by scientists. Attendees will leave with a better conception of what is needed for international scientific collaboration and what Plone can offer as an e-collaboration tool to support research infrastructures. The panel participants will bring in expertise on computer supported collaborative work (CSCW) to stimulate use and development of Plone applications for such use cases. Panel headlines: - Exchange experiences with Plone in research environments (use cases) - Requirements for Plone in research environments: what's available, which extensions or modifications do we need? - Coordinate actions around Plone products for scientific use - Promote the use of Plone in scientific environments - Confront conceptions of collaborative research processes with Plone implementations of such models
Semantic Web research anno 2006:main streams, popular falacies, current statu...Frank van Harmelen
This keynote at the Cooperative Intelligent Agents Workshop was a good opportunity to give my view on the current state of Semantic Web research: what is it about, what is it not about, what has been achieved, what remains to be done. (Includes the now infamous slide "What's it like to be a machine")
Common Qualitative Research Designs and What They’re Good ForStatistics Solutions
Thinking of conducting a qualitative study? In this presentation, you will learn more about common qualitative research designs. Included is a discussion of the applications of these designs and how they can address a variety of qualitative research questions.
Research data management for medical data with pyradigm.
Python data structure for biomedical data to manage multiple tables linked via patient info or other washable IDs. Allowing continuous validation, this data structure would improve ease of use as well as integrity of the dataset.
Data Science Provenance: From Drug Discovery to Fake FansJameel Syed
Knowledge work adds value to raw data; how this activity is performed is critical for how reliably results can be reproduced and scrutinized. With a brief diversion into epistemology, the presentation will outline the challenges for practitioners and consumers of Big Data analysis, and demonstrate how these were tackled at Inforsense (life sciences workflow analytics platform) and Musicmetric (social media analytics for music).
The talk covers the following issues with concrete examples:
- Representations of provenance
- Considerations to allow analysis computation to be recreated
- Reliable collection of noisy data from the internet
- Archiving of data and accommodating retrospective changes
- Using linked data to direct Big Data analytics
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
Ron Daniel and Corey Harper of Elsevier Labs present at the Columbia University Data Science Institute: https://www.elsevier.com/connect/join-us-as-elsevier-data-scientists-present-at-columbia-university
An Overview of the area and the current potential for the open technologies to be used, and some suggestions as to why they are not as heavily used as they should be.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. In this report
What I have learnt
●
What are the gaps in my understanding
●
Outputs so far
●
Reflection on supervision mode
●
Plan outline until December 2008
●
3. 1. What I have learnt – general
General
●
Settled down in a new environment
–
Learnt some of the regulations and how things
–
work in
The country
●
The city
●
The university
●
The faculty
●
The school
●
4. What I have learnt – less
general
Less general
●
Thesis and paper writing theory and practice
–
Specifically through the CS7100 seminar
●
LaTeX
–
Coding infrastruction
–
Warmed up!
●
Database handling
–
Administration / web applications
–
Specific
●
Biological text mining theory
–
5. Biological text mining
Biological text mining theory
●
Main problems
–
Main challenges
–
Main approaches
–
Communities
–
Events, papers, journals, competitions, etc.
–
40+ papers in my CiteULike account
●
Biological text mining hands on
●
Tools, techniques, and resources
–
i2b2
–
HIV
–
6. Biological text mining theory
Main problems
●
Information retrieval
–
Information extraction
–
Relation extraction
●
Shallow parsing / chunking
–
POS tagging
–
Word sense disambiguation
–
Term variation
–
7. Biological text mining theory
(cont.)
Main problems (cont.)
●
Named entity recognition
–
Dictionary based
●
Rule based
●
Machine learning (HMM: Zhou et al.)
●
Hybrid
●
Evaluation
–
Precision, recall, FScore
●
Sensitivity and specificity
●
Not always possible due to the lack of
●
Test Corpora
–
Common domains, techniques, goals
–
8. Biological text mining theory
(cont.)
Main challenges
●
Deal with sublanguage of biology
–
Build scalable and robust systems
–
Present the results in meaningful and informative
–
ways to the biologist
Deal with interdisciplinary aspects
–
Biology – chemistry – medicine
●
Different views / information needs
–
Specific field (biomedicine) – linguistics – computation
●
and data mining
9. Main Challenges (cont.)
Specific field (biomedicine) – linguistics –
●
computation and data mining
The text is not necessarily written to be
–
comprehensible by automatic techniques
The language is dramatically different from that
–
of e.g. newswire.
Terminology, new and coined terms, usage
–
ambiguity
Nonalgorithmic, irrational patterns in NL
–
10. Resources
I am aware of / I am using existing resources
●
Literature repositories/search engines
–
Pubmed, MEDLINE, BioMed
●
Google
●
Parsers
–
Stanford Parser
●
GeniaTagger
●
Terminological resorces
–
Gene Ontology
●
EMBLEBI
●
MeSh thesaurus
●
UMLS
●
Gene Synonym Finder, SBO, ...
●
12. Resources (cont.)
I am partially developing tools for
●
Named entity recognition
–
Relation extraction
–
I am fully tackling
●
PPI mining
●
Word sense disambiguation
●
Nominalization
●
I may have to tackle in future
–
Contradiction, negation, contrasts
●
Temporal text mining
●
13. 2. What I still need to learn
Specific
There may be gaps I am unaware of
●
Less of wheel reinvention
●
Use other software
–
Lingpipe, NLTK, Weka, RASP, ABNER, PIE,
●
BIOINFER, MALLET, Julielab, SPECIALIST, EMBL
EBI, GNN (Arizona Uni),
Use other methods/approaches
–
Machine Learning
●
Dynamic programming
●
CL / Bio text mining theory algorithms
–
Viterbi, HMM, NN, SVM, GA, CRF,
●
...
●
14. 2. What I still need to learn
Specific
Make a resources list on our web page?
●
Similar to the Stanford – outdated
–
repository
–
15. What I still need to learn – Less
general
News of the field
●
Areas/opportunities for research
●
Michael Phelps analogy
–
Developing skills for a CV
●
Ways to proove I have the skills I already have
–
Presenting results
●
Reasons, occasions, methods
–
Writing
–
Other workshops by the faculty
●
16. What I still need to learn
General
Writing, writing, writing
●
Binge writing vs. Snacking
–
Write as you go
–
Closer to the final output
●
Paperbased dissertation? Something to consider.
●
Review, get feedback, rewrite
–
A pedantic editor
–
17. What I still need to learn –
General (Cont.)
Stronger coding infrastructure
●
More reusable libraries
–
Config files
–
Oneclick approach
–
Optimisation
●
Code
–
Database
–
Query optimization
●
Database optimization
●
Server
–
Load balancing
●
Multi threading
–
Multi processor
–
18. 3. Outputs so far
Written
●
Background work survey
–
Mid April 2008
●
5 pages (approx. 1000 words)
●
Feedback from supervisor
●
Never was written up
●
Writing sample for CS7100 seminar
–
June 2008
●
Same document as above, revised and rewritten
●
12 pages, 2215 words
●
Feedback from Jim Miles and peer students
●
19. HIV
Understanding of the problem and the goals
●
Presenting the given/wanted as tables/code/
●
query
Building code infrastructure
●
Database tables
–
Utility libraries
–
Version control system
–
1500+ lines of documented, reusable code
–
20. HIV summary
Goal: to reproduce a humanproduced table
●
Each row has the following main columns
●
HIV GPN (protein name, acc, and gene ID)
–
Human GPN (protein name, acc, and gene ID)
–
A relation (interactoin) between the two
–
A description of the interaction
–
The PMIDs that the interaction has been
–
reported in
The raw input: the full abstracts
●
21. HIV results
HIV and human GPN names
●
Most where mapped to their entities
–
1237 out of 50416 currently unmapped (2%)
–
Interaction verbs
●
Interesting verbs and stems identified
–
The stems where found in the text
–
Working on stems, so including nominals, etc.
●
Terms extracted from the interaction
●
descriptions in the original data
22. Example
SELECT DISTINCT mention FROM
●
index_description_term i where
termID=28;
18 variations
●
CD4+ T T4 (CD) CD4+T
CD4, T T4(CD) T (CD4)
T CD4 CD4 (T) CD4+ (T)
CD4(+) T CD4(+)T CD4(T)
CD4 T CD4+T CD(4+) T
T4+ (CD) CD4(+)T CD4 T
23. Example
SELECT DISTINCT mention FROM
●
index_description_term i where
termID=28 or termID = 17;
28 variations
●
CD4+ T T4(CD) CD4+ (T) CD4(+) T cell
CD4, T CD4 (T) CD4(T) CD4 Tcell
T CD4 CD4(+)T CD(4+) T CD4(+) Tcell
CD4(+) T CD4+T CD4 T CD4(+)T cell
CD4 T CD4(+)T CD4+ T cell CD4+Tcell
T4+ (CD) CD4+T CD4, T cell CD4(+)Tcell
T4 (CD) T (CD4) CD4+ Tcell CD4 T cell
24. HIV results
POS tagging with GeniaTagger
●
Parsing with Stanford parser
●
Haven't used this data yet
–
Working with sentences as units
●
Normalising terms
●
Tables of synonyms
●
Tables of verb stems and terms
●
Indexes with mention/offset pairs
●
25. HIV results
Looking for sentences that share all these
●
properties with any of the goal table rows
A humanHIV pair of GPN
–
A verb phrase containing a word with the same
–
stem of the interaction verb
Any description term(s)
–
Very high recall (few false negatives)
●
Notsohigh precision (numerous false
●
positives)
Optimisation for more complicated queries
●
26. HIV next steps
Compare with other PPI mining and GPN
●
recognition tools
Find optimum parameters
●
Presentable results
●
Integrate with the interaction ontology
●
Evaluate, compare, present, get feedback
●
Apply to new papers
●
Apply to new organisms
●
Evaluate, compare, present, get feedback...
●
27. Supervision
Good points
●
Moving away from theory to tackling real
–
problems very quickly
Micromanagement while I am free to manage my
–
own time and other preferences
Planning ahead, causing commitment
–
Providing common sense, insight, and savvy
–
28. Supervision – good points
(cont.)
Providing good starting points while not ruling
–
out my own ideas
Good meeting frequency
–
Group meetings?
●
General support
–
Addressing my needs
–
Financial
●
Research interests and preferences
●
29. Supervision
Could be improved
●
Minutes were not always thorough
–
Same for tasklists
–
We could have agenda for the meetings
–
I write a list of the things that I want to discuss each
●
session
Like the one I had for this report–could have been
●
there when I presented my 3week plan
Same for TEAM meetings and HIV meetings
–
I hope we keep tackling real problems in
●
future
30. Plan
End of August
●
Presenting HIV output to the group
–
Writing HIV results
–
Sep
●
Moving to new accommodation (1120 Sep.)
–
Moving on HIV
–
Applying the ontology
●
Mining new corpora
●
Generalising?
●
31. Plan
Oct
●
Writing up HIV
–
Possible publicatoin
–
Ideas for PhD research
–
Nov
●
Finalise MPhil vs. PhD
–
Finalise PhD research area
–
Work on end of year report
–
Dec
●
Write up EOY report
–
EOY Viva
–
32. References
Ananiadou, Sophia, and John McNaught. 2006. Text Mining for Biology
●
and Biomedicine. Norwood: Artech House, Inc.
Spasić, Irena. Some Web Services relevant for biomedical applications.
●
(Presentation slides.)
Zhou, GuoDong, Jie Zhang, Jian Su, Dan Shen, and ChewLim Tan,
●
2004. Recognizing names in biomedical texts: a machine learning
approach. Bioinformatics. Vol. 20 no. 7. Pp. 11781190