INTRO INTO
DATA SCIENCE CLUB
16th OCTOBER
2017
TABLE OF CONTENT
INTRO
WHY DATA SCIENCE CLUB AND WHAT ARE THE GOALS?
INTRO INTO DATA SCIENCE
AI research isn't just in Silicon Valley
- Alan Turing, born in London
- DeepMind, founded in London
- Amazon AI labs, Berlin
- Yandex, Russia
- Baido, China
- ...
Why Data Science Club?
- Network business and academia for mutual good
- Business
- Dedicated to adding a value
- Needs an access to research, talent and innovation
- Academia
- Dedicated to research and teaching
- Can't do all the research and teaching alone
Exponea is a company with ...
- Team of Software Engineers that work on applied AI
- 80% from FIIT and Matfyz
- Data and interesting clients from all around the world
- Lots of interesting problems to solve
- Lots of relevant knowledge to share
STU FIIT is an education institution with ...
- A study programme Intelligent Software Systems
- Software Engineering
- Artificial Intelligence
- A relevant research and team of experienced researchers
- Lots of graduates with careers related to data science
- Great facilities where all of us can meet
What is Data Science Club?
- Community of like minded people with interest in
data, analytics, software engineering and AI
- Regular meetups where we share knowledge and
work on our challenges together
When?
- 6 times per semester in both summer and winter
- Monday starting at 16:00 with agenda for 3 hours:
- Expert talk(s)
- Practical workshop
- Networking, consulting, discussions
Draft of Agenda
Winter
1. Intro into Data Science
2. Data storage
3. SQL for data analytics
4. Map-Reduce
5. Stream data processing
6. Data visualisation
Summer
1. Applications of machine learning
2. Process of machine learning
3. Classification
4. Recommendation engines
5. Bandit algorithms
6. Reinforcement learning
Final agenda depends on your interest and availability of expert speakers
Goals
- Build an active community
- 3+ universities, 10+ companies, 100+ members
- Create and share content
- Reach 2000+ viewers
- Contribute to research and applications
- 5+ publications by B&A authors in 2017/18
- 5+ new features in software products
People
- Jozo Kovac, Cofounder, CTO
- Matus Cimerman, Head of AI
- Peter Kovacs, AISW Engineer
- Ondrej Brichta, AISW Engineer
- Jakub Macina, AISW Engineer
- Robert Lacok, AISW Engineer
- Dalibor Meszaros, AISW Engineer
- Lucia Siebestichova, Event Manager
- Martina Kolibasova, QA Engineer
Contacts
- FB Page: https://www.facebook.com/ExponeaSocietyBratislava/
- Slack: https://community.exponea.com/
- Wiki: https://github.com/exponea/data-science-club/wiki
- Git: https://github.com/exponea/data-science-club
- Email: matus.cimerman@exponea.com
MEET
OUR
TEAM
WRITE HERE SOMETHING
MY SEARCH FOR
A FAMILY VACATION
LETS START WITH A REAL WORLD PROBLEM
Challenges
- We pay for traffic, how to increase customer value?
- Can customer history improve the quality of search?
- Search must be fast, render everything under 100ms
A quick analysis
- A learn to rank problem
https://www.slideshare.net/MrChrisJohnson/interactive-
recommender-systems-with-netflix-and-spotify
Designing Data-Intensive Applications
WORKSHOP
16th OCTOBER
2017
Challenges - step by step
• Collect data
• Preprocess
• Train a model
• Evaluate
• Deploy
• Improve the model
Data Model
Data* InsightModel
Collect data
• If you have it, you’re done
• If not, find a way to collect
Preprocess
• What:
• Select features
• Normalize
• Fill
• Aggregate...
• How:
• Store
• Scale out processing
Create a model
• Learn about your domain
• Go really simple first
• Don’t reinvent the wheel
Evaluate
• What metrics make sense?
• Test sets vs live data
Deploy
• Online API vs batch
• Latency matters
• Reliable
• Monitor & evaluate
• Retrain on new data
Improve the model
• Read more papers and blogs. Is anyone doing it better?
• Lead, innovate and be the best
Workshop
1. Set up your environment: dependencies, template & data
2. Preprocess: Clean, normalize, feature select
(pandas/scikit-learn)
3. Train: Choose a model of your liking (scikit-learn)
4. Deploy: Create a REST API (flask)

Introduction to data science club

  • 1.
    INTRO INTO DATA SCIENCECLUB 16th OCTOBER 2017
  • 2.
    TABLE OF CONTENT INTRO WHYDATA SCIENCE CLUB AND WHAT ARE THE GOALS? INTRO INTO DATA SCIENCE
  • 7.
    AI research isn'tjust in Silicon Valley - Alan Turing, born in London - DeepMind, founded in London - Amazon AI labs, Berlin - Yandex, Russia - Baido, China - ...
  • 8.
    Why Data ScienceClub? - Network business and academia for mutual good - Business - Dedicated to adding a value - Needs an access to research, talent and innovation - Academia - Dedicated to research and teaching - Can't do all the research and teaching alone
  • 9.
    Exponea is acompany with ... - Team of Software Engineers that work on applied AI - 80% from FIIT and Matfyz - Data and interesting clients from all around the world - Lots of interesting problems to solve - Lots of relevant knowledge to share
  • 12.
    STU FIIT isan education institution with ... - A study programme Intelligent Software Systems - Software Engineering - Artificial Intelligence - A relevant research and team of experienced researchers - Lots of graduates with careers related to data science - Great facilities where all of us can meet
  • 13.
    What is DataScience Club? - Community of like minded people with interest in data, analytics, software engineering and AI - Regular meetups where we share knowledge and work on our challenges together
  • 14.
    When? - 6 timesper semester in both summer and winter - Monday starting at 16:00 with agenda for 3 hours: - Expert talk(s) - Practical workshop - Networking, consulting, discussions
  • 15.
    Draft of Agenda Winter 1.Intro into Data Science 2. Data storage 3. SQL for data analytics 4. Map-Reduce 5. Stream data processing 6. Data visualisation Summer 1. Applications of machine learning 2. Process of machine learning 3. Classification 4. Recommendation engines 5. Bandit algorithms 6. Reinforcement learning Final agenda depends on your interest and availability of expert speakers
  • 16.
    Goals - Build anactive community - 3+ universities, 10+ companies, 100+ members - Create and share content - Reach 2000+ viewers - Contribute to research and applications - 5+ publications by B&A authors in 2017/18 - 5+ new features in software products
  • 17.
    People - Jozo Kovac,Cofounder, CTO - Matus Cimerman, Head of AI - Peter Kovacs, AISW Engineer - Ondrej Brichta, AISW Engineer - Jakub Macina, AISW Engineer - Robert Lacok, AISW Engineer - Dalibor Meszaros, AISW Engineer - Lucia Siebestichova, Event Manager - Martina Kolibasova, QA Engineer
  • 18.
    Contacts - FB Page:https://www.facebook.com/ExponeaSocietyBratislava/ - Slack: https://community.exponea.com/ - Wiki: https://github.com/exponea/data-science-club/wiki - Git: https://github.com/exponea/data-science-club - Email: matus.cimerman@exponea.com
  • 19.
    MEET OUR TEAM WRITE HERE SOMETHING MYSEARCH FOR A FAMILY VACATION LETS START WITH A REAL WORLD PROBLEM
  • 22.
    Challenges - We payfor traffic, how to increase customer value? - Can customer history improve the quality of search? - Search must be fast, render everything under 100ms
  • 23.
    A quick analysis -A learn to rank problem
  • 25.
  • 28.
  • 32.
  • 33.
    Challenges - stepby step • Collect data • Preprocess • Train a model • Evaluate • Deploy • Improve the model Data Model Data* InsightModel
  • 34.
    Collect data • Ifyou have it, you’re done • If not, find a way to collect
  • 35.
    Preprocess • What: • Selectfeatures • Normalize • Fill • Aggregate... • How: • Store • Scale out processing
  • 36.
    Create a model •Learn about your domain • Go really simple first • Don’t reinvent the wheel
  • 37.
    Evaluate • What metricsmake sense? • Test sets vs live data
  • 38.
    Deploy • Online APIvs batch • Latency matters • Reliable • Monitor & evaluate • Retrain on new data
  • 39.
    Improve the model •Read more papers and blogs. Is anyone doing it better? • Lead, innovate and be the best
  • 40.
    Workshop 1. Set upyour environment: dependencies, template & data 2. Preprocess: Clean, normalize, feature select (pandas/scikit-learn) 3. Train: Choose a model of your liking (scikit-learn) 4. Deploy: Create a REST API (flask)