Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 22nd
Abstract. Generative adversarial networks (GANs) are one of the most popular models capable of producing high-quality images. However, most of the works generate images from the vector of random values, without explicit control of desired output properties. We study the ways of introducing such control for the user-selected region of interest (RoI). First, we overview and analyze the existing works in areas of image completion (inpainting) and controllable generation. Second, we propose our model based on GANs, which united approaches from the two mentioned areas, for the controllable local content generation. Third, we evaluate the controllability of our model on three accessible datasets – Celeba, Cats, and Cars – and give numerical and visual results of our method.
This topic was presented by Alexandru Arion at the 54th annual conference IEEE Global Communications Conference (GLOBECOM 2011) from 5 – 9 December 2011 in Houston, Texas.
Publication: http://bit.ly/A3iKbv
Abstract:
The trend for more online linked data becomes stronger. Foreseeing a future where "everything" will be online and linked, we ask the critical question; what is next? We envision that managing, query- ing and storing large amounts of links and data is far from yet another query processing task. We highlight two distinct and promising research directions towards managing and making sense of linked data. We in- troduce linked views to help focusing on specific link and data instances and linked history to help observe how links and data change over time.
This topic was presented by Alexandru Arion at the 54th annual conference IEEE Global Communications Conference (GLOBECOM 2011) from 5 – 9 December 2011 in Houston, Texas.
Publication: http://bit.ly/A3iKbv
Abstract:
The trend for more online linked data becomes stronger. Foreseeing a future where "everything" will be online and linked, we ask the critical question; what is next? We envision that managing, query- ing and storing large amounts of links and data is far from yet another query processing task. We highlight two distinct and promising research directions towards managing and making sense of linked data. We in- troduce linked views to help focusing on specific link and data instances and linked history to help observe how links and data change over time.
Indexing data on the web a comparison of schema level indices for data searchTill Blume
Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluated six representative index models independent of their original application for a data search scenario.
In this presentation, I am discussing, How to implement, a Decision Tree machine learning model, from Scratch, in Python. This implementation is, without using any machine learning libraries, like learn.
After completing this tutorial, you will not only learn the fundamentals of, Decision Tree algorithm but also, you will be able to implement, a Decision Tree model, from scratch.
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
Classifying hot water chemistry: Application of multivariate statisticsDasapta Erwin Irawan
The following paper is a try out on the application of multivariate analysis (regression tree, principal component analysis, and cluster analysis) for classifying hot water chemistry. The number of sample analysed was 11 (including three cold water samples), taken from three Gorontalo geothermal sites (Boalemo, Pohuwato, and Gorontalo Regency.
Regression tree technique has failed to read the data structure due to collinearity effect therefore PCA and cluster analysis were applied. We used open source R statistical packages to do the calculation.
Such technique classifies hot water samples into three major clusters: cluster 1 (hot water from Diloniyohu-Boalemo), cluster 2 (combining hot water from Tungo and Dulangeya-Boalemo, and cold water from Dulangeya-Boalemo), and cluster 3 (cold water from Pohuwato and Diloniyohu-Boalemo). According to the results, hot water from Boalemo consists of systems: distinct geothermal system and mixing system with meteoric water, while hot water from Pohuwato has no or less mixing with meteoric water.
The statistical is able to detect the close and open geothermal system based on data structure. This robust method should be applied to more geothermal system with larger dataset to see its performance.
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
I've been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered.
A deep dive into Magento 2's core functionality specifically around service contracts, repositories and data persistence. This also discusses the future of Magento 2's architecture.
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
Indexing data on the web a comparison of schema level indices for data searchTill Blume
Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluated six representative index models independent of their original application for a data search scenario.
In this presentation, I am discussing, How to implement, a Decision Tree machine learning model, from Scratch, in Python. This implementation is, without using any machine learning libraries, like learn.
After completing this tutorial, you will not only learn the fundamentals of, Decision Tree algorithm but also, you will be able to implement, a Decision Tree model, from scratch.
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
Classifying hot water chemistry: Application of multivariate statisticsDasapta Erwin Irawan
The following paper is a try out on the application of multivariate analysis (regression tree, principal component analysis, and cluster analysis) for classifying hot water chemistry. The number of sample analysed was 11 (including three cold water samples), taken from three Gorontalo geothermal sites (Boalemo, Pohuwato, and Gorontalo Regency.
Regression tree technique has failed to read the data structure due to collinearity effect therefore PCA and cluster analysis were applied. We used open source R statistical packages to do the calculation.
Such technique classifies hot water samples into three major clusters: cluster 1 (hot water from Diloniyohu-Boalemo), cluster 2 (combining hot water from Tungo and Dulangeya-Boalemo, and cold water from Dulangeya-Boalemo), and cluster 3 (cold water from Pohuwato and Diloniyohu-Boalemo). According to the results, hot water from Boalemo consists of systems: distinct geothermal system and mixing system with meteoric water, while hot water from Pohuwato has no or less mixing with meteoric water.
The statistical is able to detect the close and open geothermal system based on data structure. This robust method should be applied to more geothermal system with larger dataset to see its performance.
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
I've been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered.
A deep dive into Magento 2's core functionality specifically around service contracts, repositories and data persistence. This also discusses the future of Magento 2's architecture.
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
Towards a new hybrid approach for building documentoriented data warehIJECEIAES
Schemaless databases offer a large storage capacity while guaranteeing high performance in data processing. Unlike relational databases, which are rigid and have shown their limitations in managing large amounts of data. However, the absence of a well-defined schema and structure in not only SQL (NoSQL) databases makes the use of data for decision analysis purposes even more complex and difficult. In this paper, we propose an original approach to build a document-oriented data warehouse from unstructured data. The new approach follows a hybrid paradigm that combines data analysis and user requirements analysis. The first data-driven step exploits the fast and distributed processing of the spark engine to generate a general schema for each collection in the database. The second requirement-driven step consists of analyzing the semantics of the decisional requirements expressed in natural language and mapping them to the schemas of the collections. At the end of the process, a decisional schema is generated in JavaScript object notation (JSON) format and the data loading with the necessary transformations is performed.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...Matthias Trapp
Presentation of research paper "A Benchmark for the Use of Topic Models for Text Visualization Tasks" at the 15th International Symposium on Visual Information Communication and Interaction in Chur, Switzerland.
Learning to Rank Image Tags With Limited Training Examples1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
Modern Database Management 12th Global Edition by Hoffer solution manual.docxssuserf63bd7
https://qidiantiku.com/solution-manual-for-modern-database-management-12th-global-edition-by-hoffer.shtml
name:Solution manual for Modern Database Management 12th Global Edition by Hoffer
Edition:12th Global Edition
author:by Hoffer
ISBN:ISBN 10: 0133544613 / ISBN 13: 9780133544619
type:solution manual
format:word/zip
All chapter include
Focusing on what leading database practitioners say are the most important aspects to database development, Modern Database Management presents sound pedagogy, and topics that are critical for the practical success of database professionals. The 12th Edition further facilitates learning with illustrations that clarify important concepts and new media resources that make some of the more challenging material more engaging. Also included are general updates and expanded material in the areas undergoing rapid change due to improved managerial practices, database design tools and methodologies, and database technology.
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Content-based image retrieval (CBIR) uses the content features for
retrieving and searching the images in a given large database. Earlier,
different hand feature descriptor designs are researched based on cues that
are visual such as shape, colour, and texture used to represent these images.
Although, deep learning technologies have widely been applied as an
alternative to designing engineering that is dominant for over a decade. The
features are automatically learnt through the data. This research work
proposes integrated dual deep convolutional neural network (IDD-CNN),
IDD-CNN comprises two distinctive CNN, first CNN exploits the features
and further custom CNN is designed for exploiting the custom features.
Moreover, a novel directed graph is designed that comprises the two blocks
i.e. learning block and memory block which helps in finding the similarity
among images; since this research considers the large dataset, an optimal
strategy is introduced for compact features. Moreover, IDD-CNN is
evaluated considering the two distinctive benchmark datasets the oxford
dataset considering mean average precision (mAP) metrics and comparative
analysis shows IDD-CNN outperforms the other existing model.
A scalable gibbs sampler for probabilistic entity linkingSunny Kr
Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 24th
Abstract. In digital marketing, memes have become an attractive tool for engaging an online audience. Memes have an impact on buyers’ and sellers’ online behavior and information spreading processes. Thus, the technology of generating memes is a significant tool for social media engagement. In this study, we collected a new memes dataset of ∼650K meme instances, applied state of the art Deep Learning technique – GPT-2 model [1] towards meme generation, and compared machine-generated memes with human-created. We justified that MTurk workers can be used for the approximate estimating of users’ behavior in a social network, more precisely to measure engagement. Generated memes cause the same engagement as human memes, which didn’t collect engagement in the social network (historically). Still, generated memes are less engaging then random memes created by humans.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 24th
Abstract. In digital marketing, memes have become an attractive tool for engaging an online audience. Memes have an impact on buyers’ and sellers’ online behavior and information spreading processes. Thus, the technology of generating memes is a significant tool for social media engagement. In this study, we collected a new memes dataset of ∼650K meme instances, applied state of the art Deep Learning technique – GPT-2 model [1] towards meme generation, and compared machine-generated memes with human-created. We justified that MTurk workers can be used for the approximate estimating of users’ behavior in a social network, more precisely to measure engagement. Generated memes cause the same engagement as human memes, which didn’t collect engagement in the social network (historically). Still, generated memes are less engaging then random memes created by humans.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 23d
Abstract. In modern days synthesis of human images and videos is arguably one of the most popular topics in the Data Science community. The synthesis of human speech is less trendy but deeply bonded to the mentioned topic. Since the publication of WaveNet paper by Google researchers in 2016, the state-of-the-art approach transferred from parametric and concatenative systems to deep learning models. Most of the work on the area focuses on improving the intelligibility and naturalness of the speech. However, almost every significant study also mentions ways to generate speech with the voices of different speakers. Usually, such an enhancement requires the model’s re-training in case of generating audio with the voice of a speaker that was not present in the training set. Additionally, studies focused on highly modular speech generation are rare. Therefore there is a room left for research on ways to add new parameters for other aspects of the speech, like sentiment, prosody, and melody. In this work, we aimed to implement a competitive text-to-speech solution with the ability to specify the speaker without model re-training and explore possibilities for adding emotions to the generated speech. Our approach generates good quality speech with the mean opinion score of 3,78 (out of 5) points and the ability to mimic speaker voice in real-time, which is a big improvement over the baseline that merely obtains 2,08. On top of that, we researched sentiment representation possibilities. We built an emotion classifier that performs on the level of the current state of the art solutions by giving an accuracy of more than eighty percent.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 24th
Abstract. This work presents a context-based question answering model for the Ukrainian language based on Wikipedia articles using Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018) model, which takes a context (Wikipedia article) and a question to the context. The result of the model is an answer to the question. The model consists of two parts. The first one is a pre-trained multilingual BERT model, which is trained on the top-100, the most popular languages on Wikipedia articles. The second part is the fine-tuned model, which is trained on the data set of questions and answers to the Wikipedia articles. The training and validation data is Stanford Question Answering Dataset (SQuAD) (Rajpurkar et al., 2016). There are no question answering datasets for the Ukrainian language. The plan is to build an appropriate dataset with machine translation and use it for the fine-tuning training stage and compare the result with models which were fine-tuned on the other languages. The next experiment is to train a model on the Slavic language datasets before fine-tuning on the Ukrainian language and compare the results.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 24th
Abstract. This work tackles the problem of matching Wikipedia red links with existing articles. Links in Wikipedia pages are considered red when leading to nonexistent articles. In other Wikipedia, editions could exist articles that correspond to such red links. In
our work, we propose a way to match red links in one Wikipedia edition to existent pages in another edition. We solve this task in the context of Ukrainian red links and English existing pages. We created a dataset of 3 171 most frequent Ukrainian red links and a dataset of 2 957 927 pairs of red links and the most probable candidates for the corresponding pages in English Wikipedia. This dataset is publicly released. We defined the task as a Named Entity Linking problem. Red links are named entities and we link Ukrainian red links to English Wikipedia pages. In this work, we provide a thorough analysis of the data and define its conceptual characteristics to exploit in entity resolution. These characteristics are graph properties (connections with the pages where red links occur and connections with the pages which occur in the same pages with red links) and word properties (title names). BabelNet knowledge base was applied to this task. We evaluated its powers in terms of F1 score (29 %) and regarded it as a baseline for our approach. To improve the results we introduced several similarity metrics based on mentioned red links characteristics. Combined in a linear model they resulted in F1 score 85 % which is our best result. In our thesis, we also discuss the bottlenecks and limitations of the current approach and outline the ideas for future improvements. To the best of our knowledge, we are the first to state the problem and propose a solution for red links in the Ukrainian Wikipedia edition. All the code for this project is publicly released on github.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 24th
Abstract. Every day a lot of visitors leave countless reviews about hotels, restaurants, cafes, attractions or other services. In most cases, they set the rate about this service, sometimes they also set the rate about the specific topic if service provides this possibility. However, the main information about user opinion is hidden inside the body of review text. Thereby, in this work, we propose a solution to analyze one or several user reviews, determine sentiments and acquire important characteristics for these reviews. We determine which characteristics were influenced by such reviews. In this case, the proposed solution can detect sentiments from text and classify for pos-itive and negative. Then it acquires top positive and negative phrases, which can explain why the user left such review. Besides, we analyze all reviews about one hotel or just several reviews and summarize the most important positive and negative properties for a specific hotel.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 23rd
Abstract. Advances in the demand response for energy imbalance management (EIM) ancillary services can change the future power systems. These changes are subject to research in academia and industry. Although an important/promising part of this research is the application of Machine Learning methods to shape future power systems domain, the domain has not fully benefited from this application yet. Thus, the main objective of the presented project is to investigate and assess opportunities for applying reinforcement learning (RL) to achieve such advances by developing an intelligent voltage control-based ancillary service that uses thermostatically controlled loads (TCLs). Two stages of the project are presented: a proof of concept (PoC) and extensions. The PoC includes modeling and training of a voltage controller utilizing Q-learning, chosen due to its efficiency that is achieved without unnecessary sophistication. Simplest relevant for demand response power system of 20 TCLs is considered in the experiments to provide ancillary service. The power system model is developed with Modelica tools. Extensions aim to exceed PoC performance by applying advanced RL methods: Q-learning modification that uses a window of environment states as an input (WIQL), smart discretization strategies for environment’s continuous state space and a deep Q-network (DQN) with experience replay. To investigate particularities of the developed controller, modifications in an experimental setup such as controller testing longer than training, different simulation start time is considered. The improvement of 4% in median performance is achieved compared to the competing analytical approach – optimal constant control chosen using whole time interval simulation for the same voltage controller design. The presented results and corresponding discussions can be useful for both further works on the RL-driven voltage controllers for EIM and other applications of RL in the power system domain using Modelica models.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 23rd
Abstract. Speaker classification is an essential task in the machine learning domain, with many practical applications in identification and natural language processing. This work concentrates on speaker classification as a subtask of general speaker diarization for real-world conversation scenarios. We research the domain of modern speech processing and present the original speaker classification approach based on the recent developments in convolutional neural networks. Our method uses a spectrogram as input to the CNN classifier model, allowing it to capture spatial information about voice frequencies distribution. Presented results show beyond human ability performance and give strong prospects for future development.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 23rd
Abstract. Currently, the active development of image processing methods requires large amounts of correctly labeled data. The lack of quality data makes it impossible to use various machine learning methods. In case of limited possibilities for collecting real data, used methods for their synthetic generation. In practice, we can formulate the task of the high-quality generation of synthetic images as an efficient generation of complex data distributions, which is the object of study of this work. Generating high-quality synthetic data is an expensive and complicated process in terms of existing methods. We can distinguish two main approaches that are used to generate synthetic data: image generation based on rendered 3-D scenes and the use of GANs for simple images. These methods have some drawbacks, such as a narrow range of applicability and insufficient distribution complexity of the obtained data. When using GANs to generate complex distributions, in practice, we face a visible increase in the complexity of the model architecture and training procedure. A deep understanding of the real data complex distributions can be used to improve the quality of synthetic generation. Minimizing the differences in the real and synthetic data distributions can improve not only the generation process but also develop tools for solving the problem of data lack in the field of image processing.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 23rd
Abstract. Customer Lifetime Value (CLV) is a present value of the future cash flows attributed to a customer during their entire relationship with the company (Farris et al., 2010). CLV represents a 360-degree view of the client’s business situation (McKinsey, Customer Lifecycle Management), which takes into account the probability of customer churn and their future purchases. The modeling of CLV in retail is a complicated task due to the lack of access to historical data of purchases, the difficulty of customer identification, and building the historical reference with a particular customer. In this research, historical transactional data were taken from twelve North American brick-and-mortar grocery stores to compare different approaches to CLV modeling in terms of segmentation and forecast. The research has resulted in the suggestions on CLV estimation for the offline retail business case with given advantages and limitations of each approach.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 23rd
Abstract. In this project (Glusco and Maksymenko, 2019), we treat the Reinforcement Learning problem of Exploration vs. Exploitation. The problem can be rephrased in terms of generalization and overfitting or efficient learning. To face the problem we decided to combine the techniques from different researches: we introduce noise as an environment’s characteristics (Packer et al., 2018); create multiple Reinforcement Learning agents and environments setup to train in parallel and interact within each other (Jaderberg et al., 2017); use parallel tempering approach to initialize environments with different temperatures (noises) and perform exchanges using Metropolis-Hastings criterion (Pushkarov et al., 2019). We implemented multi-agent architecture with a parallel tempering approach based on two different Reinforcement Learning agent algorithms – Deep Q Network and Advantage Actor-Critic – and environment wrapper of the OpenAI Gym (Gym: A toolkit for developing and comparing reinforcement learning algorithms) environment for noise addition. We used the CartPole environment to run multiple experiments with three different types of exchanges: no exchange, random exchange, smart exchange according to Metropolis-Hastings rule. We implemented aggregation functionality to gather the results of all the experiments and visualize them with charts for analysis. Experiments showed that a parallel tempering approach with multiple environments with different noise level can improve the performance of the agent under specific circumstances. At the same time, results raised new questions that should be addressed to fully understand the picture of the implemented approach.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 22nd
Abstract. The thesis introduces the reader to the concepts of edge computing in terms of person re-identification and tracking problem. It describes the challenges, limitations, and current state-of-the-art solutions. The author proposed a pipeline for the task, launched several experiments on validating different parts of the system, and provided a theoretical explanation of the person re-identification process in the overlapping multi-camera environment.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 22nd
Abstract. Generative Adversarial Networks (GANs) in recent years has certainly become one of the biggest trends in the computer vision domain. GANs are used for generating face images and computer game scenes, transferring artwork style, visualizing designs, creating super-resolution images, translating text to images, etc. We want to present a model to solve an image problem: generate new outfits onto people’s images. This task seems to be extremely important for the offline/online trade and fashion industry.Changing clothing on people’s images isn’t a trivial task. The generated part of the image should have high quality without blurring. Another problem is generating long sleeves on the images with T-shirts, for example. As a result, well-known models are not suitable for this task. In the master project, we are going to reproduce the model for clothing hanging on people’s images based on the existing approaches and improve it in order to get better quality of the image.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 22nd
Abstract. Generative adversarial networks (GANs) are one of the most popular models capable of producing high-quality images. However, most of the works generate images from the vector of random values, without explicit control of desired output properties. We study the ways of introducing such control for the user-selected region of interest (RoI). First, we overview and analyze the existing works in areas of image completion (inpainting) and controllable generation. Second, we propose our model based on GANs, which united approaches from the two mentioned areas, for the controllable local content generation. Third, we evaluate the controllability of our model on three accessible datasets – Celeba, Cats, and Cars – and give numerical and visual results of our method.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 22nd
Abstract. Today virtual and augmented reality applications become more and more popular. Such a trend creates a demand for 3D processing algorithms which may be applied to many areas. This work is focused on sigh language video sequences. There are a lot of prerecorded photos and video dictionaries that can be transformed into 3D and unified in one place. We research nuances of hand pose video sequence analysis as well as the influence of results refinement for 2D and 3D keypoint detection. Besides that, we designed a solution for the parametrization of hand shape and engineered system for 3D hand pose reconstruction. Model show good results on train data but lack generalization. Retraining on multiple datasets and usage of various data augmentation techniques will improve performance.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 22nd
Abstract. Generative adversarial networks (GANs) are one of the most popular models capable of producing high-quality images. However, most of the works generate images from the vector of random values, without explicit control of desired output properties. We study the ways of introducing such control for the user-selected region of interest (RoI). First, we overview and analyze the existing works in areas of image completion (inpainting) and controllable generation. Second, we propose our model based on GANs, which united approaches from the two mentioned areas, for the controllable local content generation. Third, we evaluate the controllability of our model on three accessible datasets – Celeba, Cats, and Cars – and give numerical and visual results of our method.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 21st
Abstract. Novelty is an inherent part of innovations and discoveries. Such processes may be considered as the appearance of new ideas or as the emergence of atypical connections between existing ones. The importance of such connections hints for investigation of innovations through network or graph representation in the space of ideas. In such representation, a graph node corresponds to the relevant notion (idea), whereas an edge between two nodes means that the corresponding notions have been used in a common context. The question addressed in this research is the possibility to identify the edges between existing concepts where the innovations may emerge. To this end, a well-documented scientific knowledge landscape has been used. Namely, we downloaded 1.2M arXiv.org manuscripts dated starting from April 2007 and until September 2019; and extracted relevant concepts for them using ScienceWISE.info platform. Combining approaches developed in complex networks science and graph embedding the predictability of edges (links) on the scientific knowledge landscape where the innovations may appear is investigated. We argue that the conclusions drawn from this analysis may be used not only to the scientific knowledge analysis but are rather generic and may be applied to any domain that involves creativity within.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 21st
Abstract. Human navigation in information spaces has increasing importance in ever-growing data sources we possess. Therefore, an efficient navigation strategy would give a huge benefit to the satisfaction of human information needs. Often, the search space can be understood as a network and navigation can be seen as a walk on this network. Previous studies have shown that despite not knowing the global network structure people tend to be efficient at finding what they need. This is usually explained by the fact that people possess some background knowledge. In this work, we explore an adapted version of the network consisting of Wikipedia pages and links between them as well as human trails on it. The goal of our research is to find a procedure to label articles that are similar to a given one. Among others, this would lay a foundation for a recommender system for Wikipedia editors, which will suggest links from the given page to the related articles. Our work is, therefore, providing a basement for enhancing the Wikipedia navigation process making it more user-friendly.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 21st
Abstract. The maritime industry is huge and consists of a lot of complex processes. It is a consequence of the fact that the maritime industry provides most of the goods transportation. During transportation, people serve the vessel. And here the problem is raised of the optimal distribution of crew on vessels. This problem can be solved by formalizing the integer programming problem. In practice, we saw that solving this problem is time-consuming since there are a large number of free variables. This makes the solution inapplicable to the end-user. In this work, we describe the approach to speed up a solution of crew optimization for the maritime industry using the Rolling Time Horizon technique. Our approach is 3.5 times faster than the benchmark and deviates from the optimal solution by less than 1%.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
7. Related Work
1. Rasiwasia et al.(2010) - cross-modal retrieval for Wikipedia articles.
The dataset contains featured articles of 10 most popular
categories. Their solution approach is to exploit correlation
between text & image features obtained via latent Dirichlet
allocation and SIFT models respectively
2. Hessel et al(2018) - visual concreteness of particular topic for
Wikipedia articles. The dataset contains 192K most popular
articles, specifically included images and topics.
3. Dong et al.(2018) - cross-modal retrieval for Flickr dataset
leveraged by deep neural networks.
7
11. Collection
1. article
a. text content
b. title
1. images
b. raw images
c. metadata: description, title
d. only publicly available
11
12. Preprocessing
1. text:
a. wiki-markup removal
1. image:
b. converting everything to 600px width JPEG
c. icon removal
d. title words parsing
e. storing image features(computed with ResNet152)
12
16. Evaluation Setting
1. image-level split
a. images from the same article might appear in both test and train subsets
b. theoretical model precision with comprehensive fine grained dataset
1. article-level split
b. images from the same article always either in test or in train subset
c. real-world performance of this particular model
16
17. Baseline
Alternative to multimodal approach is classical text-based
techniques. We will experiment with a following models and choose
the best one as our baseline:
● word2vec
● wikipedia2vec
● inferText
● co-occurrence
17
26. Contribution (Conclusions)
26
1. Dataset сollection
a. 36.4K articles
b. 216K images
2. Identify best-performing text-similarity baseline
3. Word2VisualVec model adjustment to our real-world data
a. image-level model outperformed baseline by 145%*
b. article-level model outperformed baseline by 37%*
* performance compared based on averaging the R@1, R@3 and R@10 scores
27. Future Work
27
● create an API for our model to be accessible in real time
● adjust evaluation metric to recognise all photos of the same
entity as correct match, not just one mentioned in the article
● properly experiment with compound Word2VisualVec + text-
similarity model
● try more complex model, which learns best feature
representation, not assume one
● use more metadata such as article topics
● retrain the model on a bigger “good articles” dataset
28. Review Comments
1. There is no implementation details described about text encoding
methods ( see Section 4.3.2) even though that they are crucial for the
proper performance
a. Rather Disagree. All details are described in the original paper of model’s
authors. We concentrated on covering our own contribution in the thesis. But
we can see the benefit of replicating this information to make the thesis more
self-contained
2. There is no dataset statistics, train-val split descriptions and so on in the
thesis nor in the relevant kaggle-dataset page
a. Disagree. Statistics of article/image count is available. Dataset selection,
collection, cleaning, and formatting are described in details. But we agree that
additional EDA would be beneficial.
3. The problems with a presentation are small but numerous
a. Agree. Experimental section could be presented better.
28
31. Conclusions
31
1. Developed a simple cross-modal retrieval model, which
significantly outperforms our baseline
2. Showed that performance might be significantly better with
huge fine-grained dataset
3. Developed a simple text-similarity model to show that it
contains supplementary predicting power
4. Created a real-world multimodal dataset, which is publicly
available