My talk @ Smart Data Meetup in Munich: https://www.meetup.com/SmartData/events/237731342/
Learn how to build a modern NLP + deep learning pipeline with spaCy and Keras. Code samples here: https://github.com/trustyou/meetups/tree/master/smart-data
This document summarizes the key topics covered in Day 3 of a DL Chatbot seminar, including Seq2Seq models with attention mechanisms, advanced Seq2Seq architectures, and advanced attention mechanisms. The topics covered RNN encoder-decoder models, attention scoring methods, hierarchical models, personalized embeddings, copying mechanisms, bidirectional attention, self-attention models like Transformer, and various Seq2Seq implementations in PyTorch. Example papers and concepts are referenced throughout relating to sequence generation, machine translation, image captioning, and question answering.
Introduction to the new Tensorflow 2.x and the Coral AI Edge TPU hardware. The presentation introduces Tensorflow main features such as Sequential and Functional APIs, mobile support with Tensorflow Lite, web support with TensorflowJS and Google Cloud support with TFX.
In addition, the presentation introduces the new edge TPU architecture coming from Coral AI, including its main hardware features and description of the compiling flow.
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...indeedeng
This document provides an overview of the technical challenges in launching Indeed's job search platform around the world. It discusses how Indeed handles tokenization and indexing of jobs in different languages, including challenges with Chinese, Japanese, and Korean text. It describes Indeed's approaches to language detection, stemming, and query expansion to improve recall and relevance across many international markets. Key techniques discussed include n-gram tokenization, Unicode blocking, Bayesian classification, term expansion maps separated from indexing, and rule-based stemming. The goal is to make Indeed's search system scalable, generic, and able to support comprehensive use cases for job searching in different languages and regions globally.
This document provides an introduction to building a web scraper using JavaScript. It discusses the speaker's background and Thinkful's mentorship programs. It then covers JavaScript basics like variables, arrays, and using JavaScript to interact with HTML elements. It demonstrates how to grab elements of a certain class and print their text. The document advertises Thinkful's flexible online programs and high job placement rates after graduation.
This document appears to be a presentation on programming the semantic web. It discusses some of the challenges with semantic web programming, such as impedance mismatch between semantic web data and traditional programming. It presents an approach called LITEQ, which uses a Node Path Query Language (NPQL) to facilitate semantic web programming in a visual studio environment. This allows querying of semantic web data through autocompletion and compilation of queries to SPARQL to enable static typing and better integration with programming languages. The goal is to reduce costs for developers working with semantic web data.
The document provides tips on how recruiters can better manage hiring managers during the candidate matching and selection process. It suggests recruiters identify the hiring manager's needs, search for suitable candidates using the right keywords, and pitch candidate profiles that align with the roles while also highlighting potential alternative fits. The document also discusses common challenges faced by both candidates and hiring managers to provide context around expectations.
Python is a widely used programming language with a design philosophy that emphasizes code readability. The presentation covered Python installation, syntax, objects, conditions and loops, classes and functions, error handling, modules, working with files and databases. It also provided an overview of concepts in big data like volume, velocity and variety of data as well as Google Cloud tools for big data like BigQuery, Cloud Dataflow, Cloud Dataproc, and Cloud Datalab.
The document discusses the evolution of topics within the DevOps movement over time, including culture, automation, and monitoring. It notes how topics have shifted from specific tools like Puppet and Nagios to broader concepts like containers and microservices. The document also addresses challenges faced by operations teams in adopting new technologies, including pressure to use the latest tools, preexisting technical debt, and lack of time. It argues tools alone won't fix cultural issues and advocates focusing on core responsibilities rather than trying to manage every new technology.
This document summarizes the key topics covered in Day 3 of a DL Chatbot seminar, including Seq2Seq models with attention mechanisms, advanced Seq2Seq architectures, and advanced attention mechanisms. The topics covered RNN encoder-decoder models, attention scoring methods, hierarchical models, personalized embeddings, copying mechanisms, bidirectional attention, self-attention models like Transformer, and various Seq2Seq implementations in PyTorch. Example papers and concepts are referenced throughout relating to sequence generation, machine translation, image captioning, and question answering.
Introduction to the new Tensorflow 2.x and the Coral AI Edge TPU hardware. The presentation introduces Tensorflow main features such as Sequential and Functional APIs, mobile support with Tensorflow Lite, web support with TensorflowJS and Google Cloud support with TFX.
In addition, the presentation introduces the new edge TPU architecture coming from Coral AI, including its main hardware features and description of the compiling flow.
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...indeedeng
This document provides an overview of the technical challenges in launching Indeed's job search platform around the world. It discusses how Indeed handles tokenization and indexing of jobs in different languages, including challenges with Chinese, Japanese, and Korean text. It describes Indeed's approaches to language detection, stemming, and query expansion to improve recall and relevance across many international markets. Key techniques discussed include n-gram tokenization, Unicode blocking, Bayesian classification, term expansion maps separated from indexing, and rule-based stemming. The goal is to make Indeed's search system scalable, generic, and able to support comprehensive use cases for job searching in different languages and regions globally.
This document provides an introduction to building a web scraper using JavaScript. It discusses the speaker's background and Thinkful's mentorship programs. It then covers JavaScript basics like variables, arrays, and using JavaScript to interact with HTML elements. It demonstrates how to grab elements of a certain class and print their text. The document advertises Thinkful's flexible online programs and high job placement rates after graduation.
This document appears to be a presentation on programming the semantic web. It discusses some of the challenges with semantic web programming, such as impedance mismatch between semantic web data and traditional programming. It presents an approach called LITEQ, which uses a Node Path Query Language (NPQL) to facilitate semantic web programming in a visual studio environment. This allows querying of semantic web data through autocompletion and compilation of queries to SPARQL to enable static typing and better integration with programming languages. The goal is to reduce costs for developers working with semantic web data.
The document provides tips on how recruiters can better manage hiring managers during the candidate matching and selection process. It suggests recruiters identify the hiring manager's needs, search for suitable candidates using the right keywords, and pitch candidate profiles that align with the roles while also highlighting potential alternative fits. The document also discusses common challenges faced by both candidates and hiring managers to provide context around expectations.
Python is a widely used programming language with a design philosophy that emphasizes code readability. The presentation covered Python installation, syntax, objects, conditions and loops, classes and functions, error handling, modules, working with files and databases. It also provided an overview of concepts in big data like volume, velocity and variety of data as well as Google Cloud tools for big data like BigQuery, Cloud Dataflow, Cloud Dataproc, and Cloud Datalab.
The document discusses the evolution of topics within the DevOps movement over time, including culture, automation, and monitoring. It notes how topics have shifted from specific tools like Puppet and Nagios to broader concepts like containers and microservices. The document also addresses challenges faced by operations teams in adopting new technologies, including pressure to use the latest tools, preexisting technical debt, and lack of time. It argues tools alone won't fix cultural issues and advocates focusing on core responsibilities rather than trying to manage every new technology.
DevTalks Cluj - Open-Source Technologies for Analyzing TextSteffen Wenz
There are great open-source technologies for NLP (NLTK), machine learning (gensim, scikit-learn) and distribution computation (Spark). So don't shy away from big ideas, and make use of these amazing technologies at your fingertips!
Pipeline as code for your infrastructure as CodeKris Buytaert
This document discusses infrastructure as code (IAC) and continuous delivery pipelines. It introduces Puppet as an open-source configuration management tool for defining infrastructure as code. It emphasizes treating infrastructure configuration like code by versioning it, testing it, and promoting changes through environments like development, test, and production. The document also discusses using Jenkins for continuous integration to test application and infrastructure code changes and building automated pipelines for packaging and deploying changes.
This document provides a summary of best practices for DevOps as outlined by Erik Osterman of Cloud Posse. It discusses practices across organizational structure, software development, infrastructure automation, monitoring and security. Some key best practices include: establishing a makers culture with uninterrupted focus time for developers; using containers for local development environments and tools; strict branch protection and pull requests for code changes; immutable infrastructure with infrastructure as code; actionable alerts and post-mortems for monitoring; and identity-aware access, temporary credentials, and multi-factor authentication for security. The document aims to share proven strategies that help achieve reliability, speed, ease of use and affordability of systems.
This document discusses the concepts of DevOps, SecOps, and DevSecOps. It describes how the traditional divisions between development, operations, and security can lead to problems, and how adopting a DevOps culture and practices like continuous integration, infrastructure as code, and automation can help break down silos. It emphasizes that DevSecOps is about collaboration, culture change, and bringing security practices into the development lifecycle from the beginning.
BDD Testing Using Godog - Bangalore Golang Meetup # 32OpenEBS
BDD uses natural language to describe the "desired behaviour" of the system, that can be understood by both the developer and the customer
Demo of an existing BDD application using Godog predominantly used with golang
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
This document summarizes a 5-year journey of using R as the sole statistical analysis software at a CRO. Some key points:
- Initially there were questions around whether R would be sufficient, what hidden costs there may be to using open source software, how to validate and organize the working environment, and which packages would be needed.
- After 5 years of experience, the CRO found that R mostly sufficed for their work but using open source "is not free" as time must be spent collecting tools, validating them, dealing with package failures, and reporting issues. This amounts to costs equivalent to commercial software licenses.
- The CRO developed a simple, automated, template-based workflow
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
This document summarizes a 5-year journey of using R as the sole statistical programming language at a CRO. It discusses initial concerns about relying entirely on open-source R, the hidden costs of using free software, and how over 230 packages were eventually incorporated into the CRO's library. The automated, template-based workflow developed organizes analysis into regular R files with conventions for data, analysis, and report generation. Defining analysis tasks helped identify necessary tools and packages. Though challenges were faced, determination to improve R and flexibility of the software led to the decision to remain fully R-based.
The document is a presentation about using MongoDB with PHP development. It introduces the speaker and provides reasons why PHP developers should use MongoDB, including its document-oriented storage, indexing support, replication, querying and map-reduce capabilities. It discusses how MongoDB fits with PHP's object-oriented nature. It provides an e-commerce use case example and overview of using MongoDB with PHP frameworks and the MVC pattern. It encourages attendees to explore more online resources for using MongoDB and PHP.
introduction to Python by Mohamed Hegazy , in this slides you will find some code samples , these slides first presented in TensorFlow Dev Summit 2017 Extended by GDG Helwan
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
Building an E-commerce website in MEAN stackdivyapisces
This document provides an overview of building an eCommerce site using the MEAN stack. It begins with an introduction to JavaScript and then discusses the key components of the MEAN stack including Node.js, AngularJS, and MongoDB. It provides details on each component, their history, features, and how they work together. It emphasizes how MongoDB is well-suited for eCommerce applications due to its flexible schema and ability to store different product types within the same collection.
One of the main advantages of PHP is that it allows you and your company to build up projects in no time and with immediate feedback and business value. Sometimes, however, fast growth and unprevented complexities could make your codebase more and more difficult to manage as time passes and new features are added.Domain Driven Design can be an elegant solution to the problem, but introducing it in mid-large sized projects is not always easy: you have to deal with difficulties at technical, team and knowledge levels. This talk focuses on how to approach the change in your codebase and in your team mindset without breaking legacy code or stopping the development in favor of neverending refactoring sessions.
Delivering Powerful Technical Presentations
Giving a technical talk that seems completely natural, flows, and is deeply impactful is no accident. While it’s true there are those rare people who may have the ability to make it seem like they have a shortcut to the work, countless others will tell you there is no substitution for preparation, practice, and thought (and maybe the application of a few tips learned along the way).
For the last six years, I’ve had the privilege to chair technical software conferences in San Francisco, New York, and London. In Delivering Powerful Technical Presentations, I lean on that experience along with the patterns and practices for delivering technical talks found in Presentation Patterns: Techniques for Crafting Better Presentations.
You can expect discussion around:
* How to maximize your prep time and use deliberate practice
* Know your audience and techniques to engage them
* Patterns and anti-patterns of giving online technical talks
Whether you’re giving a technical presentation for the first time or your hundredth time, you will have questions and the more you know, the more comfortable you’ll be. The focus of the talk is to help you on your journey.
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
With the torrent of data available to us on the Internet, it's been increasingly difficult to separate the signal from the noise. We set out on a journey with a simple directive: Figure out a way to discover emerging technology trends. Through a series of experiments, trials, and pivots, we found our answer in the power of graph databases. We essentially built our "Emerging Tech Radar" on emerging technologies with graph databases being central to our discovery platform. Using a mix of NoSQL databases and open source libraries we built a scalable information digestion platform which touches upon multiple topics such as NLP, named entity extraction, data cleansing, cypher queries, multiple visualizations, and polymorphic persistence.
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/
Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc
When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.
1) The document discusses a presentation about Go and microservices given by Andrea Di Persio, a backend engineer at SoundCloud.
2) It covers an introduction to Go as a programming language, how SoundCloud uses Go and microservices in their infrastructure and applications, and how SoundCloud implements microservices using Go.
3) Some benefits of using Go and microservices at SoundCloud include isolated services that are easier to reason about and deploy independently while still being able to experiment and take ownership of specific domains.
Production process presentation - drupalcamp Toronto 2010Aidan Foster
This document provides an overview of Aidan Foster's presentation on how to plan and project manage a small to medium sized Drupal website. The presentation covers establishing team roles, creating proposals, planning through audience personas and content audits, visual design, production, and launching the site. It recommends tools for local development environments, version control, and project management. The goal is to make decisions early through simple means like paper to control costs and scope as the project progresses.
Compilers have been improving programmer productivity ever since IBM produced the first FORTRAN compiler in 1957. Today, we mostly take them for granted but even after more than 60 years, compiler researchers and practitioners continue to push the boundaries for what compilers can achieve as well as how easy it is to leverage the sophisticated code bases that encapsulate those six decades of learning in this field. In this talk, I want to highlight how industry trends like the migration to cloud infrastructures and data centers as well as the rise of flexibly licensed open source projects like LLVM and Eclipse OMR are paving the way towards even more effective and powerful compilation infrastructures than have ever existed: compilers with the opportunity to contribute to programmer productivity in even more ways than simply better hardware instruction sequences, and with simpler APIs so they can be readily used in scenarios where even today's most amazing Just In Time compilers are not really practical.
The document discusses LinkedIn's adoption of the Dust templating language in 2011. Some key points:
- LinkedIn needed a unified view layer as different teams were using different templating technologies like JSP, GSP, ERB.
- They evaluated 26 templating options and selected Dust as it best met their criteria like performance, i18n support, and being logic-less.
- Dust templates are compiled to JavaScript for client-side rendering and to Java for server-side rendering (SSR) through Google's V8 engine, allowing templates to work on both client and server.
- SSR addresses challenges like SEO, supporting clients without JavaScript, and i18n by rendering
Is this good Python? PyCon WEB 2017 Lightning TalkSteffen Wenz
Lightning talk I held at https://pyconweb.com/ about how my Python idioms changed over the years, and how trying to write smart (but unreadable) code is bad :)
DevTalks Cluj - Open-Source Technologies for Analyzing TextSteffen Wenz
There are great open-source technologies for NLP (NLTK), machine learning (gensim, scikit-learn) and distribution computation (Spark). So don't shy away from big ideas, and make use of these amazing technologies at your fingertips!
Pipeline as code for your infrastructure as CodeKris Buytaert
This document discusses infrastructure as code (IAC) and continuous delivery pipelines. It introduces Puppet as an open-source configuration management tool for defining infrastructure as code. It emphasizes treating infrastructure configuration like code by versioning it, testing it, and promoting changes through environments like development, test, and production. The document also discusses using Jenkins for continuous integration to test application and infrastructure code changes and building automated pipelines for packaging and deploying changes.
This document provides a summary of best practices for DevOps as outlined by Erik Osterman of Cloud Posse. It discusses practices across organizational structure, software development, infrastructure automation, monitoring and security. Some key best practices include: establishing a makers culture with uninterrupted focus time for developers; using containers for local development environments and tools; strict branch protection and pull requests for code changes; immutable infrastructure with infrastructure as code; actionable alerts and post-mortems for monitoring; and identity-aware access, temporary credentials, and multi-factor authentication for security. The document aims to share proven strategies that help achieve reliability, speed, ease of use and affordability of systems.
This document discusses the concepts of DevOps, SecOps, and DevSecOps. It describes how the traditional divisions between development, operations, and security can lead to problems, and how adopting a DevOps culture and practices like continuous integration, infrastructure as code, and automation can help break down silos. It emphasizes that DevSecOps is about collaboration, culture change, and bringing security practices into the development lifecycle from the beginning.
BDD Testing Using Godog - Bangalore Golang Meetup # 32OpenEBS
BDD uses natural language to describe the "desired behaviour" of the system, that can be understood by both the developer and the customer
Demo of an existing BDD application using Godog predominantly used with golang
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
This document summarizes a 5-year journey of using R as the sole statistical analysis software at a CRO. Some key points:
- Initially there were questions around whether R would be sufficient, what hidden costs there may be to using open source software, how to validate and organize the working environment, and which packages would be needed.
- After 5 years of experience, the CRO found that R mostly sufficed for their work but using open source "is not free" as time must be spent collecting tools, validating them, dealing with package failures, and reporting issues. This amounts to costs equivalent to commercial software licenses.
- The CRO developed a simple, automated, template-based workflow
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
This document summarizes a 5-year journey of using R as the sole statistical programming language at a CRO. It discusses initial concerns about relying entirely on open-source R, the hidden costs of using free software, and how over 230 packages were eventually incorporated into the CRO's library. The automated, template-based workflow developed organizes analysis into regular R files with conventions for data, analysis, and report generation. Defining analysis tasks helped identify necessary tools and packages. Though challenges were faced, determination to improve R and flexibility of the software led to the decision to remain fully R-based.
The document is a presentation about using MongoDB with PHP development. It introduces the speaker and provides reasons why PHP developers should use MongoDB, including its document-oriented storage, indexing support, replication, querying and map-reduce capabilities. It discusses how MongoDB fits with PHP's object-oriented nature. It provides an e-commerce use case example and overview of using MongoDB with PHP frameworks and the MVC pattern. It encourages attendees to explore more online resources for using MongoDB and PHP.
introduction to Python by Mohamed Hegazy , in this slides you will find some code samples , these slides first presented in TensorFlow Dev Summit 2017 Extended by GDG Helwan
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
Building an E-commerce website in MEAN stackdivyapisces
This document provides an overview of building an eCommerce site using the MEAN stack. It begins with an introduction to JavaScript and then discusses the key components of the MEAN stack including Node.js, AngularJS, and MongoDB. It provides details on each component, their history, features, and how they work together. It emphasizes how MongoDB is well-suited for eCommerce applications due to its flexible schema and ability to store different product types within the same collection.
One of the main advantages of PHP is that it allows you and your company to build up projects in no time and with immediate feedback and business value. Sometimes, however, fast growth and unprevented complexities could make your codebase more and more difficult to manage as time passes and new features are added.Domain Driven Design can be an elegant solution to the problem, but introducing it in mid-large sized projects is not always easy: you have to deal with difficulties at technical, team and knowledge levels. This talk focuses on how to approach the change in your codebase and in your team mindset without breaking legacy code or stopping the development in favor of neverending refactoring sessions.
Delivering Powerful Technical Presentations
Giving a technical talk that seems completely natural, flows, and is deeply impactful is no accident. While it’s true there are those rare people who may have the ability to make it seem like they have a shortcut to the work, countless others will tell you there is no substitution for preparation, practice, and thought (and maybe the application of a few tips learned along the way).
For the last six years, I’ve had the privilege to chair technical software conferences in San Francisco, New York, and London. In Delivering Powerful Technical Presentations, I lean on that experience along with the patterns and practices for delivering technical talks found in Presentation Patterns: Techniques for Crafting Better Presentations.
You can expect discussion around:
* How to maximize your prep time and use deliberate practice
* Know your audience and techniques to engage them
* Patterns and anti-patterns of giving online technical talks
Whether you’re giving a technical presentation for the first time or your hundredth time, you will have questions and the more you know, the more comfortable you’ll be. The focus of the talk is to help you on your journey.
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
With the torrent of data available to us on the Internet, it's been increasingly difficult to separate the signal from the noise. We set out on a journey with a simple directive: Figure out a way to discover emerging technology trends. Through a series of experiments, trials, and pivots, we found our answer in the power of graph databases. We essentially built our "Emerging Tech Radar" on emerging technologies with graph databases being central to our discovery platform. Using a mix of NoSQL databases and open source libraries we built a scalable information digestion platform which touches upon multiple topics such as NLP, named entity extraction, data cleansing, cypher queries, multiple visualizations, and polymorphic persistence.
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/
Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc
When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.
1) The document discusses a presentation about Go and microservices given by Andrea Di Persio, a backend engineer at SoundCloud.
2) It covers an introduction to Go as a programming language, how SoundCloud uses Go and microservices in their infrastructure and applications, and how SoundCloud implements microservices using Go.
3) Some benefits of using Go and microservices at SoundCloud include isolated services that are easier to reason about and deploy independently while still being able to experiment and take ownership of specific domains.
Production process presentation - drupalcamp Toronto 2010Aidan Foster
This document provides an overview of Aidan Foster's presentation on how to plan and project manage a small to medium sized Drupal website. The presentation covers establishing team roles, creating proposals, planning through audience personas and content audits, visual design, production, and launching the site. It recommends tools for local development environments, version control, and project management. The goal is to make decisions early through simple means like paper to control costs and scope as the project progresses.
Compilers have been improving programmer productivity ever since IBM produced the first FORTRAN compiler in 1957. Today, we mostly take them for granted but even after more than 60 years, compiler researchers and practitioners continue to push the boundaries for what compilers can achieve as well as how easy it is to leverage the sophisticated code bases that encapsulate those six decades of learning in this field. In this talk, I want to highlight how industry trends like the migration to cloud infrastructures and data centers as well as the rise of flexibly licensed open source projects like LLVM and Eclipse OMR are paving the way towards even more effective and powerful compilation infrastructures than have ever existed: compilers with the opportunity to contribute to programmer productivity in even more ways than simply better hardware instruction sequences, and with simpler APIs so they can be readily used in scenarios where even today's most amazing Just In Time compilers are not really practical.
The document discusses LinkedIn's adoption of the Dust templating language in 2011. Some key points:
- LinkedIn needed a unified view layer as different teams were using different templating technologies like JSP, GSP, ERB.
- They evaluated 26 templating options and selected Dust as it best met their criteria like performance, i18n support, and being logic-less.
- Dust templates are compiled to JavaScript for client-side rendering and to Java for server-side rendering (SSR) through Google's V8 engine, allowing templates to work on both client and server.
- SSR addresses challenges like SEO, supporting clients without JavaScript, and i18n by rendering
Is this good Python? PyCon WEB 2017 Lightning TalkSteffen Wenz
Lightning talk I held at https://pyconweb.com/ about how my Python idioms changed over the years, and how trying to write smart (but unreadable) code is bad :)
Powered by Python - PyCon Germany 2016Steffen Wenz
The document discusses how TrustYou uses Python and machine learning techniques like word embeddings and document classification to analyze over 100 million hotel reviews and provide summarizations to travelers. It also provides an overview of TrustYou's architecture, which uses Hadoop and Spark to process large amounts of review data and power their analytics using Python libraries for natural language processing and machine learning. The company is hiring data engineers and web developers to continue expanding their platform.
DevTalks Cluj - Predictions for Machine Learning in 2020Steffen Wenz
Experts predict that by the end of the century, strong artificial intelligence may be developed, potentially leading to an intelligence explosion and technological singularity. Machine learning and deep learning techniques are disrupting labor markets as intelligent machines replace human jobs. By 2020, deep learning methods are expected to continue conquering new fields like medicine and genetics, outperforming other machine learning approaches. Smart assistants using deep learning may become part of everyday life, and human-computer interaction may advance to dialog-style exchanges. However, these are predictions made by humans, who have been biased in predicting AI developments far in the future.
Helping travelers make better hotel choices - 500 million times a month
TrustYou analyzes online hotel reviews to create a summary for every hotel in the world. What do travelers think of the service? Is this hotel suitable for business travelers? TrustYou data is integrated on countless websites (Trivago, Wego, Kayak), helping travelers make better choices. Try it out yourself on http://www.trust-score.com/
TrustYou runs almost exclusively on Python. Every week, we find 3 million new hotel reviews on the web, process them, analyze the text using Natural Language Processing, and update our database of 600,000 hotels. In this talk, Steffen will give insights into how Python is used at TrustYou to collect, analyze and visualize these large amounts of data.
Cluj Big Data Meetup - Big Data in PracticeSteffen Wenz
At the Cluj Big Data Meetup, we shared some insights into TrustYou's big data tech stack. Also we introduced two tools which we've found useful in our production jobs: Apache Pig and Luigi.
Also check out the code samples on GitHub: https://github.com/trustyou/meetups/tree/master/big-data
Slides for the Cluj.py meetup where we explored the inner workings of CPython, the reference implementation of Python. Includes examples of writing a C extension to Python, and introduces Cython - ultimately the sanest way of writing C extensions.
Also check out the code samples on GitHub: https://github.com/trustyou/meetups/tree/master/python-c
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
4. ✓ Excellent hotel!
✓ Nice building
“Clean, hip & modern, excellent facilities”
✓ Great view
« Vue superbe »
5. ✓ Excellent hotel!*
✓ Nice building
“Clean, hip & modern, excellent facilities”
✓ Great view
« Vue superbe »
✓ Great for partying
“Nice weekend getaway or for partying”
✗ Solo travelers complain about TVs
ℹ You should check out Reichstag,
KaDeWe & Gendarmenmarkt.
*) nhow Berlin (Full summary)
6.
7.
8.
9. steffen@trustyou.com
● Studied CS here in Munich
● Joined TrustYou in 2008 as working student …
● First product manager, then CTO since 2012
● Manages very diverse tech stack and team of
30 engineers:
○ Data engineers
○ Data scientists
○ Web developers
10. TrustYou Architecture
TrustYou ♥ Spark + Python
NLP
Text
Generation
Machine
Learning
Aggregation
Crawling API
3M new reviews
per week!
12. Typical NLP Pipeline
Raw text
Tokenization
Part of
speech
tagging
Parsing
Sentence
splitting
Structured
data!
13. ● NLP library
● Implements NLP pipelines for English, German + others
● Focus on performance and production use
○ Largely implemented in Cython … heard of it? :)
● Plays well with machine learning libraries
● Unlike NLTK, which is more for educational use, and
sees few updates these days …
14. import spacy
nlp = spacy.load("en")
doc = nlp("This hotel is truly huge and
beautiful. I'll be back for sure")
for word in doc:
print(word)
15. doc = nlp("I'll code code")
for word in doc:
print(word.text, word.lemma_, word.pos_)
# I -PRON- PRON
# 'll will VERB
# code code VERB
# code code NOUN
17. ● “Nice room”
● “Room wasn‘t so great”
● “อาหารรสชาติดี”
● “ﺟﯾدة ﺧدﻣﺔ ”
● Custom NLP framework,
extension of NLTK
● Supports 20 languages
natively!
● Custom,
domain-specific tagging
and parsing
Semantic Analysis at TrustYou
18. Let’s do some ML!
Hm, how to model text as input for ML?
● Enter Word vectors!
● Goal: Find a mapping word → high-dimensional vector
where similar word have vectors close together
● “Woman” is close to “lady” is close to “womna”
● Word2vec is an algorithm to produce such embeddings
19. woman, lady, dude = nlp("woman lady dude")
woman.similarity(lady) # 0.78
woman.similarity(dude) # 0.40
● Word2vec considers words to be similar if they occur in
similar contexts, i.e. typically have the same words
before/after them
21. (Somewhat Pointless) Application
Goal: Predict review overall score just from title!
Input
(here, word
vectors)
Output
(here, review
score, so just one
node)
Training = rejiggering the weights of these arrows,
trying to closely match training data
22. ML 10 years ago
● Work goes into feature
engineering
● Bigram models, POS
tags, parse trees …
whatever helps
Deep learning now
● Big NNs capture lots of
complexity … can work
directly on raw data
● Bad news for domain
experts :’(
23. Keras
● High-level machine learning library
● API for defining neural network architecture
● Training & prediction is done in a backend:
○ Tensorflow
○ Theano
○ …
26. Let’s try our model:
“Perfect” → 97
“Beautiful hotel” → 95
“Good hotel” → 84
“Could have been better” → 65
“Hotel was not beautiful …” → 51
“Right in the middle of Munich” → 89
“Right in the middle of Bagdad” → 89
Trained on 1M review titles.
Mean squared error: 12/100
33. Spark
● Distributed computing framework
● User writes driver program which transparently
schedules execution in a cluster
● Faster and more expressive than MapReduce
34. Let’s try Spark!
$ # how old is the C code in CPython?
$ git clone https://github.com/python/cpython && cd cpython
$ find . -name "*.c" -exec git blame {} ; > blame
$ head blame
dc5dbf61 (Guido van Rossum 1991-02-19 12:39:46 +0000 1)
daadddf7 (Guido van Rossum 1990-10-14 12:07:46 +0000 2) /* List a no
daadddf7 (Guido van Rossum 1990-10-14 12:07:46 +0000 3)
badc12f6 (Guido van Rossum 1990-12-20 15:06:42 +0000 4) #include "pg
daadddf7 (Guido van Rossum 1990-10-14 12:07:46 +0000 5) #include "to
daadddf7 (Guido van Rossum 1990-10-14 12:07:46 +0000 6) #include "no
daadddf7 (Guido van Rossum 1990-10-14 12:07:46 +0000 7)
badc12f6 (Guido van Rossum 1990-12-20 15:06:42 +0000 8) /* Forward *
38. ● Build complex pipelines of
batch jobs
○ Dependency resolution
○ Parallelism
○ Resume failed jobs
Luigi
39. class MyTask(luigi.Task):
def output(self):
return luigi.Target("/to/make/this/file")
def requires(self):
return [
INeedThisTask(),
AndAlsoThisTask("with_some arg")
]
def run(self):
# ... then ...
# I do this to make it!