An introduction to Machine Learning and we used it at FreshBooks to automatically categorize our customers' expenses. Presented at the November 2015 ExploreTech Toronto meetup by Alex Vermeulen & Tobi Ogunbiyi
Designing for Online Collaborative SensemakingNitesh Goyal
We designed a collaborative analysis tool to explore the value of implicitly sharing insights and notes, without requiring analysts to explicitly push information or request it from each other. In an experiment, pairs of remote individuals played the role of crime analysts solving a set of serial killer crimes with both partners having some, but not all, relevant clues. When implicit sharing of notes was available, participants remembered more clues related to detecting the serial killer, and they perceived the tool as more useful compared to when implicit sharing was not available.
Designing for Online Collaborative SensemakingNitesh Goyal
We designed a collaborative analysis tool to explore the value of implicitly sharing insights and notes, without requiring analysts to explicitly push information or request it from each other. In an experiment, pairs of remote individuals played the role of crime analysts solving a set of serial killer crimes with both partners having some, but not all, relevant clues. When implicit sharing of notes was available, participants remembered more clues related to detecting the serial killer, and they perceived the tool as more useful compared to when implicit sharing was not available.
Taller formativo y Consultorías Tics dentro del programa INNOVATUR promovido por el Ayuntamiento de Sevilla y la Fundación EOI para fomentar el uso de las nuevas tecnologías en las empresas de Turismo de Sevilla. En los talleres se han dado cita más de 200 empresas participantes en el proyecto. (junio 2015)
Jornadas en el CADE-Ronda para la "Mejora de la competitividad de las micropymes del sector Turismo y Ocio mediante procesos de colaboración y concentración"
Taller impartido por Yolanda López en las XI Jornadas sobre Duelo celebradas los días 11 y 12 de noviembre de 2015 en Tres Cantos (Madrid). Yolanda López es psicóloga de la Unidad de Cuidados Paliativos San Camilo.
Más información en: http://www.humanizar.es/formacion/jornadas/xijornadasdeduelo.html
Esta semana estuvimos como ponentes en una jornada sobre City Branding y el posicionamiento de la marca Sevilla en el ámbito del turismo, debatiendo sobre su posicionamiento, para estudiar, diagnosticar y proyectar el presente y futuro de la misma.
Presentation of RAW, a prototype query engine which enables querying heterogeneous data sources transparently using Just-In-Time access paths. Presentation given at the 40th International Conference on Very Large Databases (VLDB 2014)
Introduction to Search Systems - ScaleConf Colombia 2017Toria Gibbs
Often when a new user arrives on your website, the first place they go to find information is the search box! Whether they are searching for hotels on your travel site, products on your e-commerce site, or friends to connect with on your social media site, it is important to have fast, effective search in order to engage the user.
AEMP Connect 2021 Can AI Solve Construction Telematics Overload Problem? Ode...Oded Ran
Presented at AEMP Connect 2021 on March 10, 2021.
Can AI Solve Construction Telematics Overload Problem? Oded Ran, Co-Founder & CEO, Clue Insights (getclue.com) presents on the current problems construction companies face using telematics, is AI the solution, and what is the path forward.
Introducing new features in Apache Pinot. In this talk, we will go over indexing support in Pinot, recently added text indexing feature, SQL support, and cloud readiness.
Taller formativo y Consultorías Tics dentro del programa INNOVATUR promovido por el Ayuntamiento de Sevilla y la Fundación EOI para fomentar el uso de las nuevas tecnologías en las empresas de Turismo de Sevilla. En los talleres se han dado cita más de 200 empresas participantes en el proyecto. (junio 2015)
Jornadas en el CADE-Ronda para la "Mejora de la competitividad de las micropymes del sector Turismo y Ocio mediante procesos de colaboración y concentración"
Taller impartido por Yolanda López en las XI Jornadas sobre Duelo celebradas los días 11 y 12 de noviembre de 2015 en Tres Cantos (Madrid). Yolanda López es psicóloga de la Unidad de Cuidados Paliativos San Camilo.
Más información en: http://www.humanizar.es/formacion/jornadas/xijornadasdeduelo.html
Esta semana estuvimos como ponentes en una jornada sobre City Branding y el posicionamiento de la marca Sevilla en el ámbito del turismo, debatiendo sobre su posicionamiento, para estudiar, diagnosticar y proyectar el presente y futuro de la misma.
Presentation of RAW, a prototype query engine which enables querying heterogeneous data sources transparently using Just-In-Time access paths. Presentation given at the 40th International Conference on Very Large Databases (VLDB 2014)
Introduction to Search Systems - ScaleConf Colombia 2017Toria Gibbs
Often when a new user arrives on your website, the first place they go to find information is the search box! Whether they are searching for hotels on your travel site, products on your e-commerce site, or friends to connect with on your social media site, it is important to have fast, effective search in order to engage the user.
AEMP Connect 2021 Can AI Solve Construction Telematics Overload Problem? Ode...Oded Ran
Presented at AEMP Connect 2021 on March 10, 2021.
Can AI Solve Construction Telematics Overload Problem? Oded Ran, Co-Founder & CEO, Clue Insights (getclue.com) presents on the current problems construction companies face using telematics, is AI the solution, and what is the path forward.
Introducing new features in Apache Pinot. In this talk, we will go over indexing support in Pinot, recently added text indexing feature, SQL support, and cloud readiness.
It is the slides for COSCUP[1] 2013 Hands-on[2], "Learning Python from Data".
It aims for using examples to show the world of Python. Hope it will help you with learning Python.
[1] COSCUP: http://coscup.org/
[2] COSCUP Hands-on: http://registrano.com/events/coscup-2013-hands-on-mosky
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Timeshift Everything, Miss Nothing - Mashup your PVR with Kamaeliakamaelian
This presentation on Kamaelia at Euro OSCON 2006, and specifically focusses
on a particular system - Kamaelia Macro which is essentially a system for
timeshifting pretty much everything.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. What is Machine Learning?
A computer program is said to learn if its
measured performance on a task improves with
experience.
2
3. • Google’s self-driving car
• Optical Character Recognition (OCR)
• Google street view
• Facebook
Machine Learning Applications
3
4. Supervised Learning
The machine is “trained” using examples for
which we know the correct answer.
- Labeled data
- Used for classification or prediction
4
5. • Features: shape, size, colour, and sound
• Labels: “cow”, “pig”, “chicken”, “llama”
Supervised Learning
Example
5
6. Unsupervised Learning
Tries to find patterns and groupings by analyzing
the characteristics of the data
• Unlabeled data
• Identifies patterns and groupings in the data
6
7. • No labels, just looking to group similar animals
• Features: shape, size, colour, and sound
Unsupervised Learning
7
16. 13
#5795# QTH Toronto ON
#991# Toronto ON
Roots #130 Etobicoke ON
True North Climbing Toronto ON
Tim Hortons
Eddie Bauer Canada
M9C
Pre Processing
17. 13
#5795# QTH Toronto ON
#991# Toronto ON
Roots #130 Etobicoke ON
True North Climbing Toronto ON
Tim Hortons
Eddie Bauer Canada
M9C
Pre Processing
18. 13
QTH Toronto ON
Toronto ON
Roots Etobicoke ON
True North Climbing Toronto ON
Tim Hortons
Eddie Bauer Canada
Pre Processing
19. 14
Tim Hortons QTH Toronto ON
Vectorize
(i)
(ii) Eddie Bauer Canada Toronto ON
20. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
21. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
22. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
23. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i) 1
-
-
-
-
-
-
-
(ii) Eddie Bauer Canada Toronto ON
24. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i) 1
-
-
-
-
-
-
-
(ii) Eddie Bauer Canada Toronto ON
25. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
-
-
-
-
-
-
26. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i)
(ii) Eddie Bauer Canada Toronto ON
27. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
28. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
29. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
1
-
-
30. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
0
0
0
1
1
1
1
1
33. 15
Transform
Term Frequency - Inverse Document Frequency
= term frequency x inverse document freq.
A numerical statistic that reflects how important
or descriptive a term is to a single document in a
collection.
34. 15
Transform
Term Frequency - Inverse Document Frequency
= term frequency x inverse document freq.
term freq. = occurrences of term in document
35. 15
Transform
inverse document freq.= log
total documents
docs containing the term
Term Frequency - Inverse Document Frequency
= term frequency x inverse document freq.
term freq. = occurrences of term in document
36. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
37. 16
Tf-Idf Example
tf(tim, d1) = occurrences of term
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
38. 16
Tf-Idf Example
tf(tim, d1) = occurrences of term
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
39. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i) tf(tim, d1) = 1
40. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
total docs
docs cont. term
tf(tim, d1) = 1
41. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
total docs
docs cont. term
tf(tim, d1) = 1
42. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
docs cont. term
2
tf(tim, d1) = 1
43. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
docs cont. term
2
tf(tim, d1) = 1
44. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
2
1
tf(tim, d1) = 1
45. 16
Tf-Idf Example
= 0.301
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
2
1
tf(tim, d1) = 1
46. 16
Tf-Idf Example
tfidf(tim, d1) = 1 x 0.301 = 0.301
= 0.301
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
2
1
tf(tim, d1) = 1
47. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i) .301
.301
.301
0
0
0
0
0
48. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i) .301
.301
.301
0
0
0
0
0
53. • Multinomial Logistic Regression (supervised)
• Used for unordered, categorical outputs with
more than 2 possible categories
• Series of linear sub-models which output a real
number
18
Classify
65. Refinement
• How do you know if your model is good enough?
• What can you do to improve your model?
• Adjust the amount of data
• Clean irrelevant parts of the data
• Tweak parameters of the algorithm
24
72. Prediction
27
“Advertising”
“Car & Truck Expenses
“Meals & Entertainment”
“Personal”
“Rent or Lease”
“Travel”
“Utilities”
0.020
0.018
0.710
0.230
0.003
0.011
0.008
Tim Hortons 3335 QPS Toronto ON
73. Prediction
27
“Advertising”
“Car & Truck Expenses
“Meals & Entertainment”
“Personal”
“Rent or Lease”
“Travel”
“Utilities”
0.020
0.018
0.710
0.230
0.003
0.011
0.008
Tim Hortons 3335 QPS Toronto ON
74. Prediction
27
“Advertising”
“Car & Truck Expenses
“Meals & Entertainment”
“Personal”
“Rent or Lease”
“Travel”
“Utilities”
0.020
0.018
0.710
0.230
0.003
0.011
0.008
Tim Hortons 3335 QPS Toronto ON
Threshold: 0.6
75. Getting Started
• What kind of data do you have?
• Labeled or unlabeled?
• What questions are you trying to answer?
• Make predictions
• Label or classify
• Identify patterns or groupings
28
76. Lessons Learned
• Machines are intelligent, but not magicians
• It’s easy to know you’re wrong, but harder to
know when you’re right
• Some people prefer to have control
29
77. Take away?
• Opens up new opportunities
• Potential to deliver amazing user experiences
• Machine Learning is fun!
30
79. Resources
• Interested in following along with FreshBooks
Learnings?
• medium.com/@freshbookspd
• Want to learn more about Machine Learning?
• udacity.com
• coursera.org
• Python’s scikit-learn
32