HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/-qfEOwm5Th4.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
In this talk, we discuss how we implemented H2O and LIME to predict and explain employee turnover on the IBM Watson HR Employee Attrition dataset. We use H2O’s new automated machine learning algorithm to improve on the accuracy of IBM Watson. We use LIME to produce feature importance and ultimately explain the black-box model produced by H2O.
Matt Dancho is the founder of Business Science (www.business-science.io), a consulting firm that assists organizations in applying data science to business applications. He is the creator of R packages tidyquant and timetk and has been working with data science for business and financial analysis since 2011. Matt holds master’s degrees in business and engineering, and has extensive experience in business intelligence, data mining, time series analysis, statistics and machine learning. Connect with Matt on twitter (https://twitter.com/mdancho84) and LinkedIn (https://www.linkedin.com/in/mattdancho/).
Slides of my talk on distributed deep learning concepts and platforms, from the "Deep Learning for Poets" workshop at Tehran Polytechnic on December 19th, 2018.
While the Rio 2016 Olympics are winding down and the final medals are being handed out, we thought we would share a bit of work that was done recently by Rik Van Bruggen to explore a really interesting dataset in Neo4j.
Based on an original public dataset by the UK newspaper The Guardian, Rik completed the medallist dataset to contain over 30,000 Olympians between 1896 and 2012. He created a graph model, loaded the data, and wrote a bunch of example queries that yielded some very interesting results. Join us for this 30 minute webinar where we’ll take you through this great Olympian graph and take the data for a spin yourself afterwards.
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/-qfEOwm5Th4.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
In this talk, we discuss how we implemented H2O and LIME to predict and explain employee turnover on the IBM Watson HR Employee Attrition dataset. We use H2O’s new automated machine learning algorithm to improve on the accuracy of IBM Watson. We use LIME to produce feature importance and ultimately explain the black-box model produced by H2O.
Matt Dancho is the founder of Business Science (www.business-science.io), a consulting firm that assists organizations in applying data science to business applications. He is the creator of R packages tidyquant and timetk and has been working with data science for business and financial analysis since 2011. Matt holds master’s degrees in business and engineering, and has extensive experience in business intelligence, data mining, time series analysis, statistics and machine learning. Connect with Matt on twitter (https://twitter.com/mdancho84) and LinkedIn (https://www.linkedin.com/in/mattdancho/).
Slides of my talk on distributed deep learning concepts and platforms, from the "Deep Learning for Poets" workshop at Tehran Polytechnic on December 19th, 2018.
While the Rio 2016 Olympics are winding down and the final medals are being handed out, we thought we would share a bit of work that was done recently by Rik Van Bruggen to explore a really interesting dataset in Neo4j.
Based on an original public dataset by the UK newspaper The Guardian, Rik completed the medallist dataset to contain over 30,000 Olympians between 1896 and 2012. He created a graph model, loaded the data, and wrote a bunch of example queries that yielded some very interesting results. Join us for this 30 minute webinar where we’ll take you through this great Olympian graph and take the data for a spin yourself afterwards.
Introduction to ETL, ETL vs data pipelines and how it looks like when we process big data. The challenges, complications and things we should consider when architecting big data system.
Stream processing vs batch processing and how we can combine both using Lamba architecture.
Learn more:
aka.ms/data-guide
aka.ms/stream-processing
aka.ms/building-blocks
aka.ms/start-with-the-cloud
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneSri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Effective volume anomaly detection presents unique challenges when monitoring customer transaction volumes across thousands of platforms and systems. We overcome this by using H2O, building on open source tools, and delivering machine learning anomaly detection for enterprise scale. Hear how we model, visualize then automatically alert on anomalous Mobile app volumes in real-time.
Donald Gennetten has over 15 years experience supporting digital channels in the Financial Services industry. In his current role as a Data Engineer for Capital One’s Monitoring Intelligence team, he leads a cross-functional group of Data, Business, and Engineering subject matter experts to deliver Advanced Analytics solutions for real-time customer transaction monitoring and issue detection.
Rahul Gupta is a Data Engineer in Capital One's Center for Machine Learning, focusing heavily on back-end development and model creation. His primary efforts include building an Algorithmic IT Operations (AIOps) platform that utilizes a combination of batch and streaming data with Machine Learning capabilities to improve the stability of Capital One services and overall customer experience.
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
Data science isn't an easy task to pull of.
You start with exploring data and experimenting with models.
Finally, you find some amazing insight!
What now?
How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
* Small - Medium experimentation (R)
* Big data implementation (Spark Mllib /+ pipeline)
* Setting Metrics and checks in place
* Ad hoc querying and exploring your results (Zeppelin)
* Pain points & Lessons learned the hard way (is there any other way?)
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/U-ENrMUQcJs.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Mark Landry is a competitive data scientist and a product manager at H2O.ai. He is well-trained in getting quick solutions to iterate over and enjoys testing ideas in Kaggle competitions, where his worldwide ranking stands in the top 0.03%. His first encounter with H2O.ai was while when he was hacking R. He reached out Arno Candel (CTO, H2O.ai) to team up in a Kaggle competition as he felt it would be exciting to work with the lead developers of the tool that contributed to his work in R. Mark holds a B.S. in Computer Science from Mississippi State University and was a Principal Engineer at Dell before joining H2O.ai. At Dell, he spearheaded data modeling and project support for the business transformation team, and also developed analytical tools and machine learning models to increase business efficiency.
Mark was the first Kaggle Grandmaster to be employed at H2O.ai and he enabled inroads into the Kaggle community for H2O. At H2O.ai, he has helped modernize the GBM algorithm and provided guidance on multiple projects before being pulled into one of H2O’s biggest projects. He holds interests in multi-model architectures and helps the world make fewer models that perform worse than the mean. He also has a number of publications to his name with multiple citations from the industry and academia alike.
Machine learning applications are typically stitched together from hopes and dreams, shell scripts, cron jobs, home-grown schedulers, snippets of configuration clipped from multiple blog posts, thousands of hard-coded business rules, a.k.a. "our SQL corpus," and a few lines of training and testing code. Organizing all the moving parts into something maintainable and supportive of ongoing development is a challenge most teams have on their TODO list, roadmap, or tech debt pile. Getting ahead of the day-to-day demands and settling into a sane architecture often seems like an unattainable goal. The past several years have seen an explosion of tool-building in the data engineering and analytics area, including in Apache projects spanning the areas of search and information retrieval, job orchestration, file and stream formats, and machine learning libraries. In this talk we will cover our product and development teams' choices of architecture and tools, from data ingestion and storage, through transformations and processing, to presentation of results and publishing to web services, reports, and applications.
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
Presented at the ACEMS workshop at QUT in February 2015.
Credits: whole project team (names listed in the first slide).
Approved by CSIRO to be shared externally.
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
Relationships are highly predictive of behavior, yet most data science models overlook this information because it's difficult to extract network structure for use in machine learning (ML).
With graphs, relationships are embedded in the data itself, making it practical to add these predictive capabilities to your existing practices.
That’s why we’re presenting and demoing the use of graph-native ML to make breakthrough predictions. This will cover:
- Different approaches to graph feature engineering, from queries and algorithms to embeddings
- How ML techniques leverage everything from classical network science to deep learning and graph convolutional neural networks
- How to generate representations of your graph using graph embeddings, create ML models for link prediction or node classification, and apply these models to add missing information to an existing graph/incoming data
- Why no-code visualization and prototyping is important
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Introduction to ETL, ETL vs data pipelines and how it looks like when we process big data. The challenges, complications and things we should consider when architecting big data system.
Stream processing vs batch processing and how we can combine both using Lamba architecture.
Learn more:
aka.ms/data-guide
aka.ms/stream-processing
aka.ms/building-blocks
aka.ms/start-with-the-cloud
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneSri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Effective volume anomaly detection presents unique challenges when monitoring customer transaction volumes across thousands of platforms and systems. We overcome this by using H2O, building on open source tools, and delivering machine learning anomaly detection for enterprise scale. Hear how we model, visualize then automatically alert on anomalous Mobile app volumes in real-time.
Donald Gennetten has over 15 years experience supporting digital channels in the Financial Services industry. In his current role as a Data Engineer for Capital One’s Monitoring Intelligence team, he leads a cross-functional group of Data, Business, and Engineering subject matter experts to deliver Advanced Analytics solutions for real-time customer transaction monitoring and issue detection.
Rahul Gupta is a Data Engineer in Capital One's Center for Machine Learning, focusing heavily on back-end development and model creation. His primary efforts include building an Algorithmic IT Operations (AIOps) platform that utilizes a combination of batch and streaming data with Machine Learning capabilities to improve the stability of Capital One services and overall customer experience.
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
Data science isn't an easy task to pull of.
You start with exploring data and experimenting with models.
Finally, you find some amazing insight!
What now?
How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
* Small - Medium experimentation (R)
* Big data implementation (Spark Mllib /+ pipeline)
* Setting Metrics and checks in place
* Ad hoc querying and exploring your results (Zeppelin)
* Pain points & Lessons learned the hard way (is there any other way?)
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/U-ENrMUQcJs.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Mark Landry is a competitive data scientist and a product manager at H2O.ai. He is well-trained in getting quick solutions to iterate over and enjoys testing ideas in Kaggle competitions, where his worldwide ranking stands in the top 0.03%. His first encounter with H2O.ai was while when he was hacking R. He reached out Arno Candel (CTO, H2O.ai) to team up in a Kaggle competition as he felt it would be exciting to work with the lead developers of the tool that contributed to his work in R. Mark holds a B.S. in Computer Science from Mississippi State University and was a Principal Engineer at Dell before joining H2O.ai. At Dell, he spearheaded data modeling and project support for the business transformation team, and also developed analytical tools and machine learning models to increase business efficiency.
Mark was the first Kaggle Grandmaster to be employed at H2O.ai and he enabled inroads into the Kaggle community for H2O. At H2O.ai, he has helped modernize the GBM algorithm and provided guidance on multiple projects before being pulled into one of H2O’s biggest projects. He holds interests in multi-model architectures and helps the world make fewer models that perform worse than the mean. He also has a number of publications to his name with multiple citations from the industry and academia alike.
Machine learning applications are typically stitched together from hopes and dreams, shell scripts, cron jobs, home-grown schedulers, snippets of configuration clipped from multiple blog posts, thousands of hard-coded business rules, a.k.a. "our SQL corpus," and a few lines of training and testing code. Organizing all the moving parts into something maintainable and supportive of ongoing development is a challenge most teams have on their TODO list, roadmap, or tech debt pile. Getting ahead of the day-to-day demands and settling into a sane architecture often seems like an unattainable goal. The past several years have seen an explosion of tool-building in the data engineering and analytics area, including in Apache projects spanning the areas of search and information retrieval, job orchestration, file and stream formats, and machine learning libraries. In this talk we will cover our product and development teams' choices of architecture and tools, from data ingestion and storage, through transformations and processing, to presentation of results and publishing to web services, reports, and applications.
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
Presented at the ACEMS workshop at QUT in February 2015.
Credits: whole project team (names listed in the first slide).
Approved by CSIRO to be shared externally.
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
Relationships are highly predictive of behavior, yet most data science models overlook this information because it's difficult to extract network structure for use in machine learning (ML).
With graphs, relationships are embedded in the data itself, making it practical to add these predictive capabilities to your existing practices.
That’s why we’re presenting and demoing the use of graph-native ML to make breakthrough predictions. This will cover:
- Different approaches to graph feature engineering, from queries and algorithms to embeddings
- How ML techniques leverage everything from classical network science to deep learning and graph convolutional neural networks
- How to generate representations of your graph using graph embeddings, create ML models for link prediction or node classification, and apply these models to add missing information to an existing graph/incoming data
- Why no-code visualization and prototyping is important
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Introducción al Machine Learning AutomáticoSri Ambati
¿Cómo puede llevar el aprendizaje automático a las masas? Los proyectos de Machine Learning con la búsqueda de talento, el tiempo para construir e implementar modelos y confiar en los modelos que se construyen.
¿Cómo puede tener varios equipos en su organización para crear modelos de ML precisos sin ser expertos en ciencia de datos o aprendizaje automático?
¿Se pregunta sobre los diferentes sabores de AutoML?
H2O Driverless AI emplea las técnicas de científicos expertos en datos en una aplicación fácil de usar que ayuda a escalar sus esfuerzos de ciencia de datos. La inteligencia artificial Driverless permite a los científicos de datos trabajar en proyectos más rápido utilizando la automatización y la potencia de computación de vanguardia de las GPU para realizar tareas en minutos que solían tomar meses.
Con H2O Driverless AI, todos, incluyendo expertos y científicos de datos junior, científicos de dominio e ingenieros de datos pueden desarrollar modelos confiables de aprendizaje automático. Esta plataforma de aprendizaje automático de última generación ofrece una funcionalidad única y avanzada para la visualización de datos, la ingeniería de características, la interpretabilidad del modelo y la implementación de baja latencia.
H2O Driverless AI hace:
* Visualización automática de datos
* Ingeniería automática de funciones a nivel de Grandmaster
* Selección automática del modelo
* Ajuste y capacitación automáticos del modelo
* Paralelización automática utilizando múltiples CPU o GPU
* Ensamblaje automático del modelo
*automática del Interpretaciónaprendizaje automático (MLI)
* Generación automática de código de puntuación
¿Quieres probarlo tú mismo? Puede obtener una prueba gratuita aquí: H2O Driverless AI trial.
Venga a esta sesión y descubra cómo comenzar con el Aprendizaje automático automático con AI sin conductor H2O, y cree modelos potentes con solo unos pocos clics.
¡Te veo pronto!
Acerca de H2O.ai
H2O.ai es una empresa visionaria de software de código abierto de Silicon Valley que creó y reimaginó lo que es posible. Somos una empresa de fabricantes que trajeron al mercado nuevas plataformas y tecnologías para impulsar el movimiento de inteligencia artificial. Somos los creadores de, H2O, la principal plataforma de aprendizaje de ciencia de datos de fuente abierta y de aprendizaje automático utilizada por casi la mitad de Fortune 500 y en la que confían más de 14,000 organizaciones y cientos de miles de científicos de datos de todo el mundo.
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/-rGRHrED94Y.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
Most machine learning systems enable two essential processes: creating a model and applying the model in a repeatable and controlled fashion. These two processes are interrelated and pose technological and organizational challenges as they evolve from research to prototype to production. This presentation outlines common design patterns for tackling such challenges while implementing machine learning in a production environment.
Sergei's Bio:
Dr. Sergei Izrailev is Chief Data Scientist at BeeswaxIO, where he is responsible for data strategy and building AI applications powering the next generation of real-time bidding technology. Before Beeswax, Sergei led data science teams at Integral Ad Science and Collective, where he focused on architecture, development and scaling of data science based advertising technology products. Prior to advertising, Sergei was a quant/trader and developed trading strategies and portfolio optimization methodologies. Previously, he worked as a senior scientist at Johnson & Johnson, where he developed intelligent tools for structure-based drug discovery. Sergei holds a Ph.D. in Physics and Master of Computer Science degrees from the University of Illinois at Urbana-Champaign.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
panagenda reached out to 750+ professionals to share their company’s Domino application strategy. Join this session to find out what was most important to your peers and what challenges they had to overcome to make their project a success. Franz Walder presents the exciting results of the survey and explains what role analytics can play when tackling these challenges.
Speaker: Franz Walder, Product Manager, panagenda
Abstract: panagenda reached out to 750+ professionals to share their company’s Domino application strategy. Join this session to find out what was most important to your peers and what challenges they had to overcome to make their project a success. Find out about the critical questions everybody should ask and have answers to throughout their project. Franz Walder presents the exciting results of the survey and explains what role analytics can play when tackling these challenges.
Talk given at first OmniSci user conference where I discuss cooperating with open-source communities to ensure you get useful answers quickly from your data. I get a chance to introduce OpenTeams in this talk as well and discuss how it can help companies cooperate with communities.
Algorithm Marketplace and the new "Algorithm Economy"Diego Oppenheimer
Talk by Diego Oppenheimer CEO of Algorithmia.com at Data Day Texas 2016.
Peter Sondergaard VP of Research for Gartner recently said the next digital gold rush is "How we do something with data not just what you do with it". During this talk we will cover a brief history of the different algorithmic advances in computer vision, natural language processing, machine learning and general AI and how they are being applied to Big Data today. From there we will talk about how algorithms are playing a crucial part in the next Big Data revolution, new opportunities that are opening up for startups and large companies alike as well as a first look into the role Algorithm Marketplaces will play in this space.
Similar to Data science tools - A.Marchev and K.Haralampiev (20)
[Data Meetup] Data Science in Finance - Factor Models in FinanceData Science Society
In this talk Metodi Nikolov, a Quantitative Researcher, is reviewing, without being exhaustive, the usage of factor models in finance – from the simplest single factor linear regression models, through latent variables and beyond. The focus was not be put solely on stocks but rather, on exploring other data types. The hope is to give the listeners an appreciation for the different ways the models can be applied.
[Data Meetup] Data Science in Finance - Building a Quant ML pipelineData Science Society
Georgi Kirov shares a common market-neutral statistical arbitrage framework. It will help showcase the many different ways to structure a systematic research project. From data reconciliation and signal backtesting to optimization and execution, what are some principled ways to evaluate and compare ML ideas? This process inevitably depends on the characteristics of a specific strategy, for instance, if it is liquidity-taking or liquidity-making.
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MITData Science Society
Check out our Data Science Meetup devoted to Data Science in Journalism
Dr. Preslav Nakov, Principal Scientist at QCRI, presented the #Tanbih news aggregator, which makes people aware of what they are reading.
The aggregator features media profiles that show the general factuality of reporting, the degree of propagandistic content, the hyper-partisanship, the leading political ideology, the general frame of reporting, the stance with respect to various claims and topics, as well as the audience reach and the audience bias in social media. This is part of the Tanbih project, which is developed in collaboration with MIT.
Special thanks to our partners from #Ontotext, #Telelink and #Leanplum!
#DSS #DataMeetup
Vassil Lunchev, CEO of Homeheed (https://www.homeheed.com/) presented at our July Meetup how to detect fake listings using #ComputerVision and #MachineLearning.
Imagine that you have 600,000 real estate listings with a total of 5,000,000 photos. What you know is that many of these listings are fake and some of the challenges Vassil shared in his presentation how you can detect the fake ones including the approach that works, and those which do not. Apart from that, he presented what kind of additional data is necessary to detect the fake ones.
Boyan Bonev and Demir Tonchev from Gaida.AI covered the process from the very initial concept to working software. The focus of their talk at our July Meetup was put on the challenges the domain of real estate presents for some of the standard approaches and models (#CollaborativeFiltering).
In the presentation, you can find information about all the way from #DataExploration and #modeling to the nitty-gritty of getting it all up and running in a production environment.
Demir and Boyan shared lessons they learned, mistakes they made and things they are still looking to improve in Gaida.AI (https://www.gaida.ai/).
Lessons Learned: Linked Open Data implemented in 2 Use CasesData Science Society
In this presentation for the ESSnet Linked Open Statistics final event, Sergi Segiev is presenting the learned lessons from two implemented use cases with open data for finding valuable insights.
You can also refer to the presentation 'Data Reveals Corruption Practices' by Yasen Kiprov - http://bit.ly/2WsFxsP
The presentation with the topic AI methods for localization in a noisy environment, held by Ana Antonova and Kameliya Kosekova, was introduced at Robotics Days '19.
In the next slides, you can find information techniques for Robot localization in more details and several GitHub Repos on the topic.
Team Nishki consisted of 11th-graders is presenting a Hackathon ML solution to a Kaufland Airmap case for which they won a Datathon special award.
Used methodologies and algorithms: OCR, DarkFlow, YOLO
The solution can be found at:
https://www.datasciencesociety.net/datathon/kaufland-case-datathon-2019/
Team: Evgeni Dimov, Kalin Doichev, Kostadin Kostadinov and Aneta Tsvetkova
Data Science for Open Innovation in SMEs and Large CorporationsData Science Society
Latest trends in Data Science and why the open-source culture and open innovation is expanding so fast. Find more about Data Science Society, its latest activities and how they cooperate with different local communities around the world for stimulating the new forms of education. At the end of the presentation, there are the results of two business cases from a telecom company (SNA) and a German retailer (object detection), which were solved during the Data Science Society’s hackathons (Global Datathons).
Air Pollution in Sofia - Solution through Data Science by Kiwi teamData Science Society
Some of you have already know how serious is the problem with air pollution in the capital of Bulgaria, Sofia but ...
▶️Do you know how it could be solved?
Our community represented by 1800 members all around the world tried to tackle the issue at our previous #GlobalDatathon and our international #DataScience #MonthlyChallenge, part of a university program.
Team Kiwi is solving the problem by implementing algorithms and statistical methods for air pollution prediction in the next 24 hours.
#AcademiaDatathon Finlists' Solution of Crypto Datathon CaseData Science Society
Team UNWE, one of the finalists from #AcademiaDatathon will present their solution to the #cryptocurrency data case. Explore how to perform data modeling with ARIMA and Neural Network.
To learn more visit: https://bit.ly/2uhfF37
Video from the presentation: https://bit.ly/2LlaeYd
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Data Science Society
The whole NLP Data Science solution @ https://goo.gl/iEFb1L
Syntactic Parsing or Dependency Parsing is the task of recognizing a sentence and assigning a syntactic structure to it. The most widely used syntactic structure is the parse tree which can be generated using some parsing algorithms. These parse trees are useful in various applications like grammar checking or more importantly it plays a critical role in the semantic analysis stage. For example to answer the question “Who is the point guard for the LA Laker in the next game ?” we need to figure out its subject, objects, attributes to help us figure out that the user wants the point guard of the LA Lakers specifically for the next game. This was mostly the identification and extraction NLP task for team Coala at the First Global Online Datathon.
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionData Science Society
Link to whole Data Science solution: https://goo.gl/nY3iuE
The task for the Telelink case of the First Global Datathon 2018 is to obtain the complete set of genome traces found in a single food sample and ALL organisms that should not be found in the food sample. The business needs a solution to this DNA Sequence identification case for improved quality control to be utilized in supply chains supervision and health care and protection.
- by Polina Krustanova
Open Data reveals corruption practices - case from Datathon 2017Data Science Society
A practical presentation of what Chereshka did for two days, combining the Trade Register and Public Tenders data. Yasen tell us how linked data helped the team integrate and query the two sources. He will also show some interesting initial findings, including people who participate both in the tender request and in the management of the selected bidder.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
3. 3
What we did?
Over 3000 members
Cooperation with communities
and Universities from Europe
and Asia
More than 50
countries
25 real cases in 3
Datathons
63 superb solutions
Students and experts
with up to 20 years of
experience
4 years
Area of Machine Learning,
NLP, Data enrichment,
Computer Vision and AI
Working with SME and Big
companies
4. 4
Hack the Fake
News Datathon
What we dare to?
Our first event
Two projects with
30 volunteers
Big data conference
2014, Nov 2015 2016
2017
2018
The First online
#Datathon2018
Academia
#Datathon
Apr
• Over 50 meetups
• 8 conferences participation
• 2 workshops
Jun SepFeb
Hack the News
#Datathon
#Datathon2018
v2
First Datathon in CEE
Mar May
5. 5
Past #Datathon2018
144 participants
39 teams 9 cases
493 chat rooms
16563 messages
exchanged
32 mentors and
industry experts
24
countries
The #Datathon2018 participants
managed to solve all cases
There was great fun with more than
4 fun sessions
А lot of beer and pizza was
consumed
38 quality Data solutions at the
end
Great results challenging
even for the companies
38 solutions
6. 6
Impressions
Milena Yankova
Head of Research &
Innovation
Shashank Shekhar
Manager - Data Sciences .
Agamemnon
Baltagiannis
Principal Data Scientist
Tomislav Križan
CPO, Member of the Board
“The results of our case are
impressive and have further
motivated our R&D department to
explore more opportunities and apply
some of the team results that worked
on it.”
“The best thing about this Datathon
was its global footprint. I was amazed
by the sheer enthusiasm that the
participants demonstrated. The
resilience and adaptability shown by a
lot of them in providing a working
solution to real life problems made
this Datathon a huge success."
“Thank you all for this great
weekend. It was a fantastic
challenge and I am happy that I
saw deep technical work from all
the participants! I will be always
here to support the DSS
community”
“From all finalists we did see
good and novel approach...
also those who didn't arrive to
finals, were also really close ...
so good job to all teams!"
“The teams solutions were well documented in CRISP-DM
Methodology at Datathon 2018 organized by DSS, in which
Kaufland was proud to participate”
8. Introduction
• It is impossible to cover all tools,
• so we reduced the number of tools covered to the ones we use
• Still the task is hard, due to:
– Various types of tools (noise in the input data)
– Many criteria (so multi-dimensional problem)
– Tools for many purposes (overlapping categories)
• Hmmmm..!? Sounds like an ideal case for Multi-dimensional
scaling (MDS)
• SO LET’S GO FULL NERDY ON IT
12. Excel Data Analysis
• Application: Statistical analysis
• Interface: Menus and windows
• Price: Licensed
• Pros: Availability (almost
everybody have Excel)
• Cons: Works with selected cells
not with variable names
13. IBM SPSS Statistics
• Application: Statistical analysis; Econometric analysis
• Interface: Menus and windows; Command console
• Price: Licensed
• Pros: Very large set of analyses
• Cons: Non-interactive
14. PSPP
• Application: Statistical analysis
• Interface: Menus and windows
• Price: Free
• Pros: “Free” SPSS Statistics
• Cons: Relatively small set of
analyses; Non-interactive
16. Gretl
• Application: Econometric analysis
• Interface: Menus and windows;
Command console
• Price: Free
• Pros: Hansl (localized user manual)
• Cons: Limit to the volume of data
18. Python
• Application: Statistical analysis;
Econometric analysis; Data
mining
• Interface: Command console
• Price: Free
• Pros: Global community
developing libs
• Cons:
19. R (+R studio)
• Application: Statistical analysis;
Econometric analysis; Data mining
• Interface: Command console
• Price: Free
• Pros: Global community
developing libs
• Cons: a little weird language
20. Jupyter Notebook
• Application: Data mining
• Interface: Online platform
• Price: Free
• Pros: Industry standard for Data Science
• Cons:
23. JASP
• Application: Statistical analysis
• Interface: Menus and windows
• Price: Free
• Pros: Interactive
• Cons: Relatively small set of
analyses
24. Weka
• Application: Statistical analysis; Data
mining
• Interface: Graphical stream/workflow
• Price: Free
• Pros: One of the original revolutionaries
• Cons: outdated and clumsy
25. Rapid Miner
• Application: Statistical analysis; Data
mining
• Interface: Graphical stream/workflow
• Price: Licensed
• Pros: Probably the most intuitive interface
• Cons:
26. KNIME
• Application: Statistical analysis; Data mining
• Interface: Graphical stream/workflow
• Price: Free
• Pros: Interactive
• Cons: Relatively small set of analyses
27. Orange
• Application: Data mining
• Interface: Graphical
stream/workflow
• Price: Free
• Pros: Interactive
• Cons: Relatively small set
of analyses
28. IBM SPSS Modeler
• Application: Econometric analysis; Data mining
• Interface: Graphical stream/workflow
• Price: Licensed
• Pros: well utilizing resources
• Cons: not user friendly when dealing with lots of
features
29. MatLab Classification Learner
• Application: Data mining
• Interface: Graphical
stream/workflow
• Price: Licensed
• Pros: part of Matlab
environment
• Cons: still under
development to include more
models
31. Microsoft Azure
• Application: Data mining
• Interface: Online
platform
• Price: Licensed
• Pros: Many tools already
available
• Cons: Could be a little
hard to set-up
32. IBM Watson Studio
• Application: Data mining
• Interface: Online platform
• Price: Licensed
• Pros: brand new
• Cons: still some computability issues
33. Amazon ML
• Application: Data mining
• Interface: Online platform
• Price: Licensed
• Pros: integrated with AWS
S3 and could work real-
time
• Cons: still under
development to include
more models
34. Google Colab
• Application: Data mining
• Interface: Online platform
• Price: Free
• Pros: GPU computation via Tensor Flow
• Cons: 12 hours at a time
36. Selection tree
• What type of problem do you solve? (Application)
• What type of interface would be suitable? (Workflow)
• Licensed or non-licensed? (Price)
Application Workflow Price Software
Statistical analysis Menus and windows Licensed Excel Data Analysis IBM SPSS Statistics
Free PSPP JASP
Command console Licensed MatLab IBM SPSS Statistics
Free R (+ R Studio) Python
Graphical stream/workflow Licensed Rapid Miner
Free KNIME Weka
Econometric analysis Menus and windows Licensed eViews IBM SPSS Statistics
Free Gretl
Command console Licensed eViews IBM SPSS Statistics MatLab
Free Gretl R (+ R Studio) Python
Graphical stream/workflow Licensed IBM SPSS Modeler
Data mining Command console Licensed Matlab
Free R (+ R Studio) Python
Graphical stream/workflow Licensed IBM SPSS Modeler Rapid Miner Matlab Classification App
Free Orange KNIME Weka
Online platform Licensed IBM Watson Studio Microsoft Azure Amazon ML
Free Google Colab Jupyter Notebook
37. Q & A
• angel.marchev@datasciencesociety.net
k_haralampiev@phls.uni-sofia.bg