Titash Mandal is a graduate student pursuing a Master's degree in Computer Science from New York University with extensive experience in programming, technical skills, and academic projects. He has worked as a teaching assistant and lead instructor for K-12 STEM programs where he designed curriculum and taught topics in computer science and cybersecurity. His research experience includes developing algorithms for image analysis and retinal disease classification.
Towards reproducibility and maximally-open dataPablo Bernabeu
Presented at the Open Scholarship Prize Competition 2021, organised by Open Scholarship Community Galway.
Video of the presentation: https://nuigalway.mediaspace.kaltura.com/media/OSW2021A+OSCG+Open+Scholarship+Prize+-+The+Final!/1_d7ekd3d3/121659351#t=56:08
Data has always been used in every company irrespective of its domain to improve the operational
efficiency and the products themselves. However, analyzing and extracting information from “Big Data”
is the next revolution in technology, since previously unknown nuggets of information are now made
visible. In fact, over 90% of the data available in the world has been generated in the last two years.
“Big Data” analytics has become the next hot topic for most companies - from financial institutions to
technology companies to service providers. Likewise in software engineering, data collected about the
development of software, the operation of the software in the field, and the users feedback on software
have been used before. However, collecting and analyzing this information across hundreds of thousands
or millions of software projects gives us the unique ability to reason about the ecosystem at large, and
software in general. At no time in history has there been easier access to extremely powerful
computational resources as it is today, thanks to the advances in cloud computing, both from the
technology and business perspectives. Therefore, it is easier today than ever before to analyze big data.
In this technical briefing, we will present the state-of-the-art with respect to the research carried out in the
area of big data analytics in software engineering research. We will present the research along three
dimensions:
1) What are the software engineering problems being solved? Examples of problems include: How
much source code is newly written and how much is reused from past projects? Can we
recommend best practices to developers by observing the development of software among
hundreds of thousands of software projects?
2) What are the datasets that are being used? Examples of my datasets include: all the mobile apps
in the Google Play store, all of the world's Open Source projects, and hundreds of gigabytes of
execution logs. Such large datasets provide us with a unique view into the SE field.
3) What are the tools and techniques available to analyze the large datasets? We intend to present
generic software solutions that have been applied to big datasets in other areas of research, and
the tools and techniques created by software engineering researchers.
In the end we will present the challenges inherently present in large datasets - volume, variety, velocity,
and veracity. Such challenges often complicate the analysis of the data and can invalidate the
interpretation of the results. We will conclude with the future opportunities that are present in big data
analytics for software engineering research.
Towards reproducibility and maximally-open dataPablo Bernabeu
Presented at the Open Scholarship Prize Competition 2021, organised by Open Scholarship Community Galway.
Video of the presentation: https://nuigalway.mediaspace.kaltura.com/media/OSW2021A+OSCG+Open+Scholarship+Prize+-+The+Final!/1_d7ekd3d3/121659351#t=56:08
Data has always been used in every company irrespective of its domain to improve the operational
efficiency and the products themselves. However, analyzing and extracting information from “Big Data”
is the next revolution in technology, since previously unknown nuggets of information are now made
visible. In fact, over 90% of the data available in the world has been generated in the last two years.
“Big Data” analytics has become the next hot topic for most companies - from financial institutions to
technology companies to service providers. Likewise in software engineering, data collected about the
development of software, the operation of the software in the field, and the users feedback on software
have been used before. However, collecting and analyzing this information across hundreds of thousands
or millions of software projects gives us the unique ability to reason about the ecosystem at large, and
software in general. At no time in history has there been easier access to extremely powerful
computational resources as it is today, thanks to the advances in cloud computing, both from the
technology and business perspectives. Therefore, it is easier today than ever before to analyze big data.
In this technical briefing, we will present the state-of-the-art with respect to the research carried out in the
area of big data analytics in software engineering research. We will present the research along three
dimensions:
1) What are the software engineering problems being solved? Examples of problems include: How
much source code is newly written and how much is reused from past projects? Can we
recommend best practices to developers by observing the development of software among
hundreds of thousands of software projects?
2) What are the datasets that are being used? Examples of my datasets include: all the mobile apps
in the Google Play store, all of the world's Open Source projects, and hundreds of gigabytes of
execution logs. Such large datasets provide us with a unique view into the SE field.
3) What are the tools and techniques available to analyze the large datasets? We intend to present
generic software solutions that have been applied to big datasets in other areas of research, and
the tools and techniques created by software engineering researchers.
In the end we will present the challenges inherently present in large datasets - volume, variety, velocity,
and veracity. Such challenges often complicate the analysis of the data and can invalidate the
interpretation of the results. We will conclude with the future opportunities that are present in big data
analytics for software engineering research.
Machine learning is permeating nearly every industry – from retail and financial services to entertainment and transportation. And, while it's been slow to make its way into healthcare, machine learning stands to transform this space, too… positioning us to better diagnose, predict outcomes, provide follow-up care, and tailor treatments.
In this webinar, PointClear Solutions' Michael Atkins discusses the current state of machine learning in healthcare and what we can expect in the near future:
• What is machine learning and how is it being used today?
• What are some of the risks and obstacles we face in implementing this new technology?
• Looking into the future, what role will machine learning play in transforming healthcare?
• How can my company prepare for machine learning?
Dataset from National Institute of justice about the crimes of San Francisco. Apply Network Analysis after calculation of distances between different crime points as nodes of a city and then put the approach.
Machine learning is permeating nearly every industry – from retail and financial services to entertainment and transportation. And, while it's been slow to make its way into healthcare, machine learning stands to transform this space, too… positioning us to better diagnose, predict outcomes, provide follow-up care, and tailor treatments.
In this webinar, PointClear Solutions' Michael Atkins discusses the current state of machine learning in healthcare and what we can expect in the near future:
• What is machine learning and how is it being used today?
• What are some of the risks and obstacles we face in implementing this new technology?
• Looking into the future, what role will machine learning play in transforming healthcare?
• How can my company prepare for machine learning?
Dataset from National Institute of justice about the crimes of San Francisco. Apply Network Analysis after calculation of distances between different crime points as nodes of a city and then put the approach.
I am a graduate from the University of Washington, Seattle. I majored in Applied Physics and Applied Mathematics. I spent around 10 months, being part of a research group, where my job was to initially learn about the toolkit developed and utilized at CERN for data analysis. Once I had gotten used to it, I was to use pyROOT, to analyze sample CERN data and to eliminate background noise from the data using statistical methods.
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...aciijournal
Advanced Computational Intelligence: An International Journal (ACII) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of computational intelligence. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced computational intelligence concepts and establishing new collaborations in these areas.
Significant Role of Statistics in Computational SciencesEditor IJCATR
This paper is focused on the issues related to optimizing statistical approaches in the emerging fields of Computer Science
and Information Technology. More emphasis has been given on the role of statistical techniques in modern data mining. Statistics is
the science of learning from data and of measuring, controlling, and communicating uncertainty. Statistical approaches can play a vital
role for providing significance contribution in the field of software engineering, neural network, data mining, bioinformatics and other
allied fields. Statistical techniques not only helps make scientific models but it quantifies the reliability, reproducibility and general
uncertainty associated with these models. In the current scenario, large amount of data is automatically recorded with computers and
managed with the data base management systems (DBMS) for storage and fast retrieval purpose. The practice of examining large preexisting
databases in order to generate new information is known as data mining. Presently, data mining has attracted substantial
attention in the research and commercial arena which involves applications of a variety of statistical techniques. Twenty years ago
mostly data was collected manually and the data set was in simple form but in present time, there have been considerable changes in
the nature of data. Statistical techniques and computer applications can be utilized to obtain maximum information with the fewest
possible measurements to reduce the cost of data collection.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Comparative structure of adrenal gland in vertebrates
Official resume titash_mandal_
1. TITASH MANDAL
“A recognized student leader, mentor & innovator with extensive programming, technical and collaborative skills.”
Email: tm2761@nyu.edu | Phone:(716)-275-5897 | Address: 309E, 49th Street, Apartment 10-F, New York, NY – 10017
Website: titashmandal.me | LinkedIn: https://www.linkedin.com/in/titash-mandal-9a6899117 | GitHub: https://github.com/Titash21
EDUCATION
New York University, Tandon School of Engineering, Master of Science in Computer Science, Brooklyn, NY GPA: 3.9/4.00 May 2018
KIIT University, Bachelor of Technology in Computer Science & Engineering, Bhubaneswar, India CGPA: 9.36/10 April 2016
PROGRAMMING/TECHNICAL SKILLS
• Programming Languages: Core Java, Python, C, C++, HTML, CSS, Java Script
• Operating Systems: Windows OS, Linux (Ubuntu), Mac OS
• Other Skills: Wireshark, IntelliJ, MATLAB, Python IDLE, Atom, MS Word, PowerPoint, Excel, SQL, Android Studio, Eclipse
• Data Skills: Tensorflow, Hadoop, PySpark, Matplotlib, MLLib, Python’s Scikit-learn Pandas, Numpy , Coursera Certification in Neural Networks
WORK EXPERIENCE
Teaching Assistant, Computer Science, Brooklyn, NY September 2017 - Present
Instructed two weekly Python programming labs to undergraduate students, provided weekly programming assignments and mentoring sessions to clarify doubts.
Debugged programs, created coding assignment challenges, clarified several programming concepts and assisted in grading homework’s and examinations.
Lead Instructor, K-12 STEM Program - Computer Science for Cybersecurity, Brooklyn, NY June 2017 - Present 2017
Mentored a successful pilot program, featured in the Wall St. Journal for introducing Cybersecurity among high school kids.
Designed and taught Computer Science and Cybersecurity concepts like- Mono-alphabetic and Polyalphabetic ciphers, Networking Protocols, Cryptography,
Stenography, Python, Databases, Website Development and Social Engineering topics like - Phone Snooping, security breaches in social networking websites.
Programmed the cyber security challenges and programming activities using tools like Python, Flask, Stepic, Stegsolve, Autopsy, PHPMyAdmin, SQL.
Graduate Research Assistant, New York University, Brooklyn, NY Feb 2017 - June 2017
Developed novel algorithms for color and image analysis, segmentation, feature quantification and statistical analysis related to retinal diseases.
Performed Retinal Image grading and implemented Image Segmentation algorithms for identifying the drusen deposits in the eye of patients.
Research Assistant, IIT Kharagpur, India May 2014 - July 2014
Formulated a C program to test power dissipated in the Memory Built in Self-Test architecture.
Proposed an algorithm for grouping the memory cores based on distance and timing constraint enabling faster calculation.
Researched the reuse of the available memory core network to act as a Test Access Mechanism bringing down the area overhead as well as reduce testing power.
ACADEMIC PROJECTS
Implemented NVidia’s Research Paper on Progressive Growing of Generative Adversarial Networks. (Tensorflow, Python) April 2018 – May 2017
Built and trained generator and discriminator networks to generate synthetic images from latent noise using Pixel normalization, WGAN Loss, Sliced Wasserstein
distance, slow progressive growing from lower resolution to higher resolution on CELEBA and LSUN datasets.
Implemented a Deep Neural Network and Convolutional Neural Network on cifar10. (Tensorflow, Python, Pandas, Numpy) Feb 2018 – March 2017
Built and trained a deep L-layer Neural Network and analyzed matrix and vector dimensions to check neural network implementations.
Illustrated hyper parameter tuning to improve the performance of forward propagation and backward propagation.
Implemented convolution and pooling operation with padding, strides and proper filter for image multi-class classification.
Analyzing levels of air pollution, explaining changes in global land temperatures and relating pollution to deaths and diseases using Big Data Technology.
(Spark, PySpark, SparkSQL, MLLib, Matplotlib, R) October 2017 - October 2017
Performed data analysis to determine the most polluted cities in US based on AQI levels of gases using PySpark and SparkSQL.
Correlated multiple datasets of Pollution and Global Land Temperature to determine how pollution levels effect yearly average temperatures in different cities.
Joined three datasets namely: Pollution Dataset, Land Temperature Dataset and Global Death dataset to analyze changes in yearly death rates due to pollution.
Performed an exploratory analysis on daily tweets to see current awareness using Machine Learning Algorithms identify positive and negative sentiments.
Applying Data Science and Data Mining tools to analyze the factors causing Doctors to prescribe branded vs generic drugs. October 2017 - October 2017
Performed feature engineering to identify the most import attributes using entropy and mutual information, converted the categorical attributes to numerical
attributes, removed selection bias in the dataset, identified and mitigated concept drift and chose accuracy as the performance measure for the problem.
Implemented feature reduction using Principle Component Analysis and Tree Based Pruning to avoid memory leakage.
Applied various ML Algorithms like random forest, decision tree, ada-boost to identify the best regression algorithm for the problem set and generate conclusions.
Analyzing primary factors for customer churn in an organization using Predictive Modeling (Python) October 2017 - October 2017
Implemented a predictive model to identify factors causing customer churn using Decision Trees and identified correlation between the driving factors of churn.
Motion detection to navigate a game in Optical Flow (Python, Opencv2) May 2017 - May 2017
Developed a program which performs real time analysis of a video captured by the webcam to detect a specific colored object in the video frame using opencv.
Applied the Lucas Kanade algorithm to the individual video frames to determine the flow and velocity vectors of the object.
Implemented an algorithm to link the displacement vectors calculated during the object’s motion to navigate a game in real-time using motion detection.
LEADERSHIP AWARDS & APPOINTMENTS
Graduate Orientation Caption, NYU, Brooklyn, NY Jan 2018 – Jan 2018
Executive Board Member, Web Master, SHPE (Society of Hispanic Professionals), Brooklyn, NY December 2017- Present
Student Leadership Award Recipent, NYU, Brooklyn, NY March 2017-March 2017
Participant and Mentee – Womentorship Program, NYU, Brooklyn, NY Jan 2017- December 2017
Lead Dancer, Team Leader, Member, UG, KIIT University, Bhubaneswar, India April 2013 - April 2015