Python là ngôn ngữ lập trình đơn giản và đang càng này càng trở lên phổ biến. Bài giảng này cung cấp cách tiếp cận đơn giản dễ hiểu với python một cách dễ dàng nhất
발표자: 김준호(Lunit)
발표일: 2018.1.
의료 AI 관련 중, nodule detection 문제에 대해 다뤄보고자 합니다.
의료 AI에서는 어떠한 방식으로 classication을 하고, preprocessing은 어떤식으로 진행되는지 LUNA16이라는 의료 challenge에 이용되는 데이터를 가지고 발표를 진행해보고자 합니다.
이후, 이 데이터를 이용해서 최근 2017 MICCAI (의료 영상학회에서는 높은 수준의 학회)에서 발표된 "curriculum adaptive sampling for extreme data imbalance"를 실제 구현해서 적용해보고 이 때 발생할 수 있는 문제를 어떤식으로 해결할 수 있는지에 대한 tip도 제공할 예정입니다. (Python multi-processing data load, input-pipeline)
위 논문을 선정한 이유는, 단순한 classification이 아닌, nodule이 있는 위치도 정확하게 catch하는 논문 중, performance가 상당히 높기 때문입니다.
In this show-and-tell session, we’ll find out how Neo4j can make us a professional League of Legends player. First, we’ll import game data into a Neo4j graph. Then we will push the limits of Cypher to uncover hidden secrets that lie in the depth of LoL.
• Which champions perform best as allies?
• What is the most effective strategy to advance your champion?
• What enemy team setup is the most dangerous?
This session covers a rarely seen graph domain (gaming) and will provide you will solid, factual advice on how to play LoL.
Python là ngôn ngữ lập trình đơn giản và đang càng này càng trở lên phổ biến. Bài giảng này cung cấp cách tiếp cận đơn giản dễ hiểu với python một cách dễ dàng nhất
발표자: 김준호(Lunit)
발표일: 2018.1.
의료 AI 관련 중, nodule detection 문제에 대해 다뤄보고자 합니다.
의료 AI에서는 어떠한 방식으로 classication을 하고, preprocessing은 어떤식으로 진행되는지 LUNA16이라는 의료 challenge에 이용되는 데이터를 가지고 발표를 진행해보고자 합니다.
이후, 이 데이터를 이용해서 최근 2017 MICCAI (의료 영상학회에서는 높은 수준의 학회)에서 발표된 "curriculum adaptive sampling for extreme data imbalance"를 실제 구현해서 적용해보고 이 때 발생할 수 있는 문제를 어떤식으로 해결할 수 있는지에 대한 tip도 제공할 예정입니다. (Python multi-processing data load, input-pipeline)
위 논문을 선정한 이유는, 단순한 classification이 아닌, nodule이 있는 위치도 정확하게 catch하는 논문 중, performance가 상당히 높기 때문입니다.
In this show-and-tell session, we’ll find out how Neo4j can make us a professional League of Legends player. First, we’ll import game data into a Neo4j graph. Then we will push the limits of Cypher to uncover hidden secrets that lie in the depth of LoL.
• Which champions perform best as allies?
• What is the most effective strategy to advance your champion?
• What enemy team setup is the most dangerous?
This session covers a rarely seen graph domain (gaming) and will provide you will solid, factual advice on how to play LoL.
The OSI Superboard II was the computer on which I first learned to program back in 1979. Python is why programming remains fun today. In this tale of old meets new, I describe how I have used Python 3 to create a cloud computing service for my still-working Superboard--a problem complicated by it only having 8Kb of RAM and 300-baud cassette tape audio ports for I/O.
A sprint thru Python's Natural Language ToolKit, presented at SFPython on 9/14/2011. Covers tokenization, part of speech tagging, chunking & NER, text classification, and training text classifiers with nltk-trainer.
NLTK - Natural Language Processing in Pythonshanbady
For full details, including the address, and to RSVP see: http://www.meetup.com/bostonpython/calendar/15547287/ NLTK is the Natural Language Toolkit, an extensive Python library for processing natural language. Shankar Ambady will give us a tour of just a few of its extensive capabilities, including sentence parsing, synonym finding, spam detection, and more. Linguistic expertise is not required, though if you know the difference between a hyponym and a hypernym, you might be able to help the rest of us! Socializing at 6:30, Shankar's presentation at 7:00. See you at the NERD.
The Agenda for the Webinar:
1. Introduction to Python.
2. Python and Big Data.
3. Python and Data Science.
4. Key features of Python and their usage in Business Analytics.
5. Business Analytics with Python – Real world Use Cases.
The basics of Python are rather straightforward. In a few minutes you can learn most of the syntax. There are some gotchas along the way that might appear tricky. This talk is meant to bring programmers up to speed with Python. They should be able to read and write Python.
En esta charla veremos con detalle algunas de las construcciones más pythonicas y las posibilidades de expresar de forma clara, concisa y elegante cosas que en otros lenguajes nos obligarían a dar muchos rodeos.
A veces es fácil olvidar algunos recursos como que una función puede devolver varios valores, cómo manipular listas y diccionarios de forma sencilla, contextos, generadores... En esta charla veremos de forma entretenida y práctica cómo mejorar nuestro nivel de Python "nativo".
A Development of Log-based Game AI using Deep LearningSuntae Kim
PyCon Korea 2018 발표자료입니다.
https://www.pycon.kr/2018/program/34
게임에서 AI는 빠질 수 없는 기능으로 그동안 다양한 장르와 플랫폼에서 사용되어 왔다.
특히 요즘 모바일 게임에서 자동 플레이 AI는 흔히 '노가다', '피로도'를 줄여주기 위한 매우 중요한 기능으로 자리잡고 있다. 하지만 지금까지의 자동 플레이 AI는 정해진 범위안에서 작동하는 FSM(Finite State Machine) 형태로 구현되다 보니 AI가 동작하는 경우의 수가 유한하고 한정적이라고 볼 수 있다. 때로는 이렇게 정해진 패턴의 AI가 유저들에게는 마치 로보트와 같은 느낌을 주기도 한다. 아무리 State를 추가하고 자연스럽게 구현해보려고 해도 어디까지 자연스럽게 해줘야 할 것인가에 대한 한계에 맞닥들이게 된다. 고려해야 할 경우의 수가 많기 때문이다.
하지만 이렇게 다양한 경우의 수를 로직으로 구현하지 않고 사용자가 플레이했던 데이터를 이용하여 학습시켜보면 어떨까? 이 호기심을 시작으로 LINE에서 자체 개발한 "리틀나이츠" 모바일 게임에 적용해보기로 했다. 게임 런칭 후 실제 사용자 플레이 로그를 수집하여 전처리하고 학습시켜서, 기획 의도에 맞게 유저가 Offline일 때 자신을 대신해서 플레이해 줄 수 있는 AI를 개발하였다. 이를 위해서 유저가 언제 어떤 카드를 선택했고 어디에 배치했는지, When, What, Where 3가지 상황에 대해서 학습시켰고 게임에 적용시켜 보았다.
단계별 과정을 간략하게 살펴보면, 먼저 로그 포멧을 게임 개발팀과 함께 정의했다. 두번째로 유저가 플레이했던 배틀 정보가 사전에 정의했던 로그 포맷 형태로 하둡에 쌓이게 했으며, 세번째로 Apache Spark을 이용하여 저장된 대용량 플레이 로그를 분산으로 전처리하여 데이터를 학습 가능한 형태로 가공하였다. 네번째로 AI 모델을 만들기 위한 뉴럴 네트워크를 설계하고, Python과 TensorFlow를 이용하여 데이터를 학습시켰다. 다섯번째로 학습에 반영되지 않는 순수한 테스트 데이터로 예측률을 구해본다. 이때 최적의 모델을 찾기 위해서 인내를 가지고 테스트하게 되는데, 먼저 하이퍼파라메터를 변경해보고 그래도 성능이 안나오면 뉴럴 네트워크와 데이터 전처리를 다양하게 변경해가며 테스트해 본다. 참고로 이러한 과정에 소비되는 Cost를 줄이고 싶다면 AutoML에 관심을 가져보아도 좋다. 여섯번째로 Python으로 개발된 AI 모델을 C# 기반의 유니티 환경에서 구동시키기 위해서 LineTensorFlow(가칭) 라이브러리를 개발해서 유니티 게임에 적용하였다.
발표 끝부분에서는 학습 지표를 공유하고, 알고리즘 기반의 AI와 딥러닝 기반의 AI에 대해서 플레이 비교 영상을 보고, 어떤 것이 딥러닝을 이용한 AI인지 맞춰보는 시간도 갖아본다. 이 발표에서는 로그 기반의 게임 AI가 개발되는 과정에서 파이썬이 어떻게 활용되었는지 살펴보고, 그 동안 겪었던 문제와 해결 방법에 대해서 공유하고자 한다.
Самые вкусные баги из игрового кода: как ошибаются наши коллеги-программисты ...DevGAMM Conference
Один из лучших способов снизить количество багов в играх – это показывать программистам, как не стоит писать код. В своём докладе я соберу самые вкусные и необычные ошибки, которые удалось найти в C++ и C# коде таких игр, как VVVVVV, Space Engineers, Command & Conquer, osu! и даже Doom. Я уверен, что каждый из слушателей обязательно узнает для себя что-то новое. В конце концов, это просто приятно – лично увидеть ошибки из кода знакомой и любимой игры!
Highlights a bunch of different Python tricks and tips - from the stupid to the awesome (and a bit of both).
See how to register a 'str'.decode('hail_mary') codec, call_functions[1, 2, 3] instead of call_functions(1, 2, 3), creating a "Clojure-like" threading syntax by overloading the pipe operator, create useful equality mocks by overloading the equality operator, ditch JSON for pySON and put together a tiny lisp based on Norvig's awesome article.
A slightly-modified version of my IPRUG talk, this time for the BT DevCon5 developer conference at Adastral Park on 25 May 2012.
The main changes are the addition of the Ruby section and the increased number of HHGTTG references in honour of towel day.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
2. Themes
• Easy!
• Short!
• General purpose tools – easily adaptable
• Great for teaching Python
• Hook young minds with truly interesting
problems.
3. Topics:
1. Exhaustive search using new itertools
2. Database mining using neural nets
3. Automated categorization with a naive Bayesian
classifier
4. Solving popular puzzles with depth-first and
breath-first searches
5. Solving more complex puzzles with constraint
propagation
6. Play a popular game using a probing search
strategy
4. Eight Queens – Six Lines
http://code.activestate.com/recipes/576647/
from itertools import permutations
n=8
cols = range(n)
for vec in permutations(cols):
if (n == len(set(vec[i]+i for i in cols))
== len(set(vec[i]-i for i in cols))):
print vec
6. • 'VIOLIN * 2 + VIOLA == TRIO + SONATA',
• 'SEND + A + TAD + MORE == MONEY',
• 'ZEROES + ONES == BINARY',
• 'DCLIZ + DLXVI == MCCXXV',
• 'COUPLE + COUPLE == QUARTET',
• 'FISH + N + CHIPS == SUPPER',
• 'SATURN + URANUS + NEPTUNE + PLUTO == PLANETS',
• 'EARTH + AIR + FIRE + WATER == NATURE',
• ('AN + ACCELERATING + INFERENTIAL + ENGINEERING + TALE + ' +
• 'ELITE + GRANT + FEE + ET + CETERA == ARTIFICIAL + INTELLIGENCE'),
• 'TWO * TWO == SQUARE',
• 'HIP * HIP == HURRAY',
• 'PI * R ** 2 == AREA',
• 'NORTH / SOUTH == EAST / WEST',
• 'NAUGHT ** 2 == ZERO ** 3',
• 'I + THINK + IT + BE + THINE == INDEED',
• 'DO + YOU + FEEL == LUCKY',
• 'NOW + WE + KNOW + THE == TRUTH',
• 'SORRY + TO + BE + A + PARTY == POOPER',
• 'SORRY + TO + BUST + YOUR == BUBBLE',
• 'STEEL + BELTED == RADIALS',
• 'ABRA + CADABRA + ABRA + CADABRA == HOUDINI',
• 'I + GUESS + THE + TRUTH == HURTS',
• 'LETS + CUT + TO + THE == CHASE',
• 'THATS + THE + THEORY == ANYWAY',
7. Alphametics Solver – Recipe 576647
def solve(s):
words = findall('[A-Za-z]+', s)
chars = set(''.join(words))
assert len(chars) <= 10
firsts = set(w[0] for w in words)
chars = ''.join(firsts) + ''.join(chars - firsts)
n = len(firsts)
for perm in permutations('0123456789', len(chars)):
if '0' not in perm[:n]:
trans = maketrans(chars, ''.join(perm))
equation = s.translate(trans)
if eval(equation):
print equation
8. Neural Nets for Data-Mining
• ASPN Cookbook Recipe: 496908
• Parallel Distributed Processing: IAC example
9. What is the core concept?
• A database can be modeled as a brain
(neural net)
• Unique field values in the database are neurons
• Table rows define mutually excitory
connections
• Table columns define competing inhibitory
connections
10. How do you use it?
• Provide a stimulus to parts of the neural net
• Then see which neurons get activated the
most
13. What can this do that SQL can’t?
• Make generalizations
• Survive missing data
• Extrapolate to unknown instances
14. Jets and Sharks example
Art Jets 40 jh sing pusher
Al Jets 30 jh mar burglar
Sam Jets 20 col sing bookie
Clyde Jets 40 jh sing bookie
Mike Jets 30 jh sing bookie
...
Earl Sharks 40 hs mar burglar
Rick Sharks 30 hs div burglar
Ol Sharks 30 col mar pusher
Neal Sharks 30 hs sing bookie
Dave Sharks 30 hs div pusher
15. Neuron for Every unique
value in the database
Art, Al, Sam, Jets, Sharks, 20, 30, 40,
pusher, bookie ...
• All neurons start in a resting state
• Excited by neural connections – defined by the
table rows
• Inhibited by other neurons in the same pool –
defined by table columns
• Can also be excited or inhibited externally –
probes into the database
19. Neuron class
class Unit(object):
def __init__(self, name, pool):
self.name = name
self.pool = pool
self.reset()
self.exciters = []
unitbyname[name] = self
def computenewact(self):
ai = self.activation
plus = sum(exciter.output for exciter in self.exciters)
minus = self.pool.sum - self.output
netinput = alpha*plus - gamma*minus + estr*self.extinp
if netinput > 0:
ai = (maxact-ai)*netinput - decay*(ai-rest) + ai
else:
ai = (ai-minact)*netinput - decay*(ai-rest) + ai
self.newact = max(min(ai, maxact), minact)
20. Pool class
class Pool(object):
__slots__ = ['sum', 'members']
def __init__(self):
self.sum = 0.0
self.members = set()
def addmember(self, member):
self.members.add(member)
def updatesum(self):
self.sum = sum(member.output for member in self.members)
21. Engine
def run(times=100):
quot;quot;quot;Run n-cycles and display resultquot;quot;quot;
for i in xrange(times):
for pool in pools:
pool.updatesum()
for unit in units:
unit.computenewact()
for unit in units:
unit.commitnewact()
print '-' * 20
for pool in pools:
pool.display()
22. What have we accomplished:
• 100 lines of Python models any database as
neural net
• Probing the brain reveals or confirms
hidden relationships
24. What is Mastermind?
• A codemaker picks a 4-digit code: (4, 3, 3, 7)
• The codebreaker makes a guess: (3, 3, 2, 4)
• The codemaker scores the guess: (1, 2)
• The codebreaker uses the information to make
better guesses
• There are many possible codebreaking strategies
• Better strategies mean that fewer guesses are
needed
25. What is the interesting part?
• Experimenting with different strategies to
find the best
• Finding ways to make it fast
26. The basic code takes only 30 lines:
import random
from itertools import izip, imap
digits = 4
fmt = '%0' + str(digits) + 'd'
searchspace = tuple([tuple(map(int,fmt % i)) for i in
range(0,10**digits)])
def compare(a, b, map=imap, sum=sum, zip=izip, min=min):
count1 = [0] * 10
count2 = [0] * 10
strikes = 0
for dig1, dig2 in zip(a,b):
if dig1 == dig2:
strikes += 1
count1[dig1] += 1
count2[dig2] += 1
balls = sum(map(min, count1, count2)) - strikes
return (strikes, balls)
27. (Continued)
def rungame(target, strategy, maxtries=15):
possibles = list(searchspace)
for i in xrange(maxtries):
g = strategy(i, possibles)
print quot;Out of %7d possibilities. I'll guess %rquot; % (len(possibles), g),
score = compare(g, target)
print ' ---> ', score
if score[0] == digits:
print quot;That's it. After %d tries, I won.quot; % (i+1,)
break
possibles = [n for n in possibles if compare(g, n) == score]
return i+1
29. Step 1: List out the search space
digits = 4
fmt = '%0' + str(digits) + 'd‘
searchspace = tuple([tuple(map(int,fmt % i)) for i in range(0,10**digits)])
30. This creates a sequence of all possible plays:
(0, 0, 0, 0),
(0, 0, 0, 1),
(0, 0, 0, 2),
...
(9, 9, 9, 9),
32. Step 3: Devise a strategy for choosing the next
move
def s_allrand(i, possibles):
'Simple strategy that randomly chooses one possibility'
return random.choice(possibles)
33. Step 4: Make an engine to run the game
• Start with a full search space
• Let the strategy pick a probe
• Score the result
• Pare down the search space using the score
35. • Out of 10000 possibilities. I'll guess (0, 0, 9, 3) ---> (0, 1)
• Out of 3052 possibilities. I'll guess (9, 4, 4, 8) ---> (0, 1)
• Out of 822 possibilities. I'll guess (8, 3, 8, 5) ---> (1, 0)
• Out of 123 possibilities. I'll guess (3, 3, 2, 4) ---> (1, 2)
• Out of 6 possibilities. I'll guess (4, 3, 3, 6) ---> (3, 0)
• Out of 2 possibilities. I'll guess (4, 3, 3, 1) ---> (3, 0)
• Out of 1 possibilities. I'll guess (4, 3, 3, 7) ---> (4, 0)
• That's it. After 7 tries, I won.
36. Step 6: Make it fast
• psyco gives a 10:1 speedup
• code the compare() function in C
37. Step 7: Experiment with a new strategy
• If possible, make a guess with no duplicates
def s_trynodup(i, possibles):
for j in xrange(20):
g = random.choice(possibles)
if len(set(g)) == digits:
break
return g
38. Step 8: Try a smarter strategy:
The utility of a guess is how well it divides-up the
remaining possibilities
def utility(play, possibles):
b = {}
for poss in possibles:
score = compare(play, poss)
b[score] = b.get(score, 0) + 1
return info(b.values())
39. (Continued)
The information content of that division is
given by Shannon’s formula
def info(seqn):
bits = 0
s = float(sum(seqn))
for i in seqn:
p=i/s
bits -= p * log(p, 2)
return bits
40. (Continued)
Select the probe with the greatest information content
def s_bestinfo(i, possibles):
if i == 0:
return s_trynodup(i, possibles)
plays = random.sample(possibles, min(20, len(possibles)))
_, play = max([(utility(play, possibles), play) for play in plays])
return play
41. Step 9: Make it fast
Instead of evaluating the utility of every move, pick the best of a small sample:
def s_samplebest(i, possibles):
if i == 0:
return s_trynodup(i, possibles)
if len(possibles) > 150:
possibles = random.sample(possibles, 150)
plays = possibles[:20]
elif len(possibles) > 20:
plays = random.sample(possibles, 20)
else:
plays = possibles
_, play = max([(utility(play, possibles), play) for play in plays])
return play
42. Step 10: Bask in the glow of your model:
• The basic framework took only 30 lines of code
• Four different strategies took another 30
• It’s easy to try out even more strategies
• The end result is not far from the theoretical
optimum
43. Sudoku-style Puzzles
• An exercise in constraint propagation
• ASPN Cookbook Recipe
• Google for: sudoku norvig
• Wikipedia entry: sudoku
45. What does a Sudoku puzzle look like when it is
solved
276|415|938
581|329|764
934|876|512
---+---+---
352|168|479
149|753|286
768|942|153
---+---+---
497|681|325
615|234|897
823|597|641
46. What is the interesting part?
• Use constraints to pare-down an enormous
search-space
• Finding various ways to propagate
constraints
• Enjoy the intellectual challenge of the
puzzles without wasting time
• Complete code takes only 56 lines
(including extensive comments)
47. Step 1: Choose a representation and a way to
display it
'53 7 6 195 98 68 6 34 83 17 2 66
28 419 5 8 79‘
def show(flatline):
'Display grid from a string (values in row major order
with blanks for unknowns)'
fmt = '|'.join(['%s' * n] * n)
sep = '+'.join(['-' * n] * n)
for i in range(n):
for j in range(n):
offset = (i*n+j)*n2
print fmt % tuple(flatline[offset:offset+n2])
if i != n-1:
print sep
48. Step 2: Determine which cells are in contact
with each other
def _find_friends(cell):
'Return tuple of cells in same row, column, or subgroup'
friends = set()
row, col = cell // n2, cell % n2
friends.update(row * n2 + i for i in range(n2))
friends.update(i * n2 + col for i in range(n2))
nw_corner = row // n * n3 + col // n * n
friends.update(nw_corner + i + j for i in range(n) for j in
range(0,n3,n2))
friends.remove(cell)
return tuple(friends)
friend_cells = map(_find_friends, range(n4))
49. Step 3: Write a solver
def solve(possibles, pending_marks):
# Apply pending_marks (list of cell,value pairs) to possibles (list of str).
# Mutates both inputs. Return solution as a flat string (values in row-major order)
# or return None for dead-ends where all possibilites have been eliminated.
for cell, v in pending_marks:
possibles[cell] = v
for f in friend_cells[cell]:
p = possibles[f]
if v in p:
p = possibles[f] = p.replace(v, '') # exclude value v from friend f
if not p:
return None # Dead-end: all possibilities eliminated
if len(p) == 1:
pending_marks.append((f, p[0]))
50. (Continued)
# Check to see if the puzzle is fully solved (each cell has only one
possible value)
if max(map(len, possibles)) == 1:
return ''.join(possibles)
# If it gets here, there are still unsolved cells
cell = select_an_unsolved_cell(possibles)
for v in possibles[cell]: # try all possible values for that cell
ans = solve(possibles[:], [(cell, v)])
if ans is not None:
return ans
51. Hey, what did that solver do again?
• Makes an assumption about a cell
• Goes to each of the friend_cells and
eliminates that possibility
• Check to see if the puzzle is solved
• If some cell goes down to one possibility,
repeat the process
• If some cells are unsolved, make another
assumption
52. Step 4: Pick a search strategy:
def select_an_unsolved_cell(possibles, heuristic=min):
# Default heuristic: select cell with fewest possibilities
# Other possible heuristics include: random.choice() and max()
return heuristic((len(p), cell) for cell, p in
enumerate(possibles) if len(p)>1)[1]
53. Here’s where it gets fun.
Step 5: Bask in the glow of your model:
• The basic framework took only 56 lines of code
• It’s easy to try-out alternative search heuristics
• The end result solves “hard” puzzles with very
little guesswork
• It’s easy to generate all possible solutions
• The framework can be extended for other ways to
propagate constraints
• See the wikipedia entry for other solution
techniques
54. Bayesian Classifier
• Google for: reverend thomas python
• Or link to: http://www.divmod.org
/projects/reverend
55. Principle
• Use training data (a corpus) to compute conditional
probabilities
• Given some attribute, what is the conditional probability of
a given classification
• Now, given many attributes, combine those probabilities
• Done right, you have to know joint probabilities
• But if you ignore those, it tends to work out just fine
56. Example
• Guess which decade a person was born
• Attribute 1 – the person’s name is Gertrude
• Attribute 2 – their favorite dance is the Foxtrot
• Attribute 3 – the person lives in the Suburbs
• Attribute 4 – the person drives a Buick Regal
• We don’t know the joint probabilities, but can
gather statistics on each on by itself.
57. Tricks of the trade
• Since we’ve thrown exact computation away, what is
the best approximation
• The simplest way is to multiply the conditional
probabilities
• The conditionals can be biased by a count of one to
avoid multiplying by zero
• The information content can be estimated using the
Claude Shannon formula
• The probabilities can be weighted by significance
• and so on . . .
58. Coding it in Python
• Conceptually simple
• Just count attributes in the corpus to
compute the conditional probabilities
• Just multiply (or whatever) the results for
each attribute
• Pick the highest probability result
• Complexity comes from effort to identify
attributes (parsing, tokenizing, etc)
59. Language classification example
>>> from reverend.thomas import Bayes
>>> guesser = Bayes()
>>> guesser.train('french', 'le la les du un une je il
elle de en')
>>> guesser.train('german', 'der die das ein eine')
>>> guesser.train('spanish', 'el uno una las de la en')
>>> guesser.train('english', 'the it she he they them
are were to')
>>> guesser.guess('they went to el cantina')spanish
>>> guesser.guess('they were flying planes')english
60. The hot topic
• Classifying email as spam or ham
• Simple example using reverend.thomas
• Real-world example with spambayes
61. Generic Puzzle Solving Framework
Depth-first and breadth-first tree
searches
• Link to: http://
users.rcn.com/python/download/puzzle.py
• Google for: puzzle hettinger
• Wikipedia: depth-first search
62. What does a generic puzzle look like?
• There an initial position
• There is a rule for generating legal moves
• There is a test for whether a state is a goal
• Optionally, there is a way to pretty print the
current state
66. Code for the puzzle
class GolfTee( Puzzle ):
pos = '011111111111111'
goal = '100000000000000'
triples = [[0,1,3], [1,3,6], [3,6,10], [2,4,7], [4,7,11], [5,8,12],
[10,11,12], [11,12,13], [12,13,14], [6,7,8], [7,8,9], [3,4,5],
[0,2,5], [2,5,9], [5,9,14], [1,4,8], [4,8,13], [3,7,12]]
def __iter__( self ):
for t in self.triples:
if self.pos[t[0]]=='1' and self.pos[t[1]]=='1' and self.pos[t[2]]=='0':
yield TriPuzzle(self.produce(t,'001'))
if self.pos[t[0]]=='0' and self.pos[t[1]]=='1' and self.pos[t[2]]=='1':
yield TriPuzzle(self.produce(t,'100'))
def produce( self, t, sub ):
return self.pos[:t[0]] + sub[0] + self.pos[t[0]+1:t[1]] +
sub[1] + self.pos[t[1]+1:t[2]] + sub[2] + self.pos[t[2]+1:]
def canonical( self ):
return self.pos
def __repr__( self ):
return 'n %sn %s %sn %s %s %sn %s %s %s
%sn%s %s %s %s %sn' % tuple(self.pos)
67. How does the solver work?
• Start at the initial position
• Generate legal moves
• Check each to see if it is a goal state
• If not, then generate the next legal moves
and repeat
68. Code for the solver
def solve( pos, depthFirst=0 ):
queue, trail = deque([pos]), {pos.canonical():None}
solution = deque()
load = depthFirst and queue.append or queue.appendleft
while not pos.isgoal():
for m in pos:
c = m.canonical()
if c in trail: continue
trail[c] = pos
load(m)
pos = queue.pop()
while pos:
solution.appendleft( pos)
pos = trail[pos.canonical()]
return solution
83. How about Jug Filling Puzzle
Given a two empty jugs with 3 and 5 liter
capacities and a full jug with 8 liters, find a
sequence of pours leaving four liters in the
two largest jugs
84. What does the code look like?
class JugFill( Puzzle ):
pos = (0,0,8)
capacity = (3,5,8)
goal = (0,4,4)
def __iter__(self):
for i in range(len(self.pos)):
for j in range(len(self.pos)):
if i==j: continue
qty = min(self.pos[i], self.capacity[j] - self.pos[j])
if not qty: continue
dup = list( self.pos )
dup[i] -= qty
dup[j] += qty
yield JugFill(tuple(dup))
85. What does the solution look like:
(0, 0, 8)
(0, 5, 3) Pour the 8 into the 5 until it is full leaving 3
(3, 2, 3) Pour the 5 into the 2 until it is full leaving 2
(0, 2, 6) Pour the 3 into the other the totaling 6
(2, 0, 6) Pour the 2 into the empty jug
(2, 5, 1) Pour the 6 into the 5 leaving 1
(3, 4, 1) Pour the 5 into the 3 until full leaving 4
(0, 4, 4) Pour the 1 into the 3 leaving 4
86. How about a black/white marble jumping
puzzle?
Given eleven slots in a line with four white marbles
in the leftmost slots and four black marbles in the
rightmost, make moves to put the white ones on
the right and the black on the left. A valid move
for a while marble isto shift right into an empty
space or hop over a single adjacent black marble
into an adjacent empty space -- don't hop over
your own color, don't hop into an occupied space,
don't hop over more than one marble. The valid
black moves are in the opposite direction.
Alternate moves between black and white marbles.
87. (Continued)
In the tuple representation below, zeros are
open holes, ones are whites, negative ones
are blacks, and the outer tuple track
whether it is whites move or blacks.
88. The code is straight-forward:
pos = (1,(1,1,1,1,0,0,0,-1,-1,-1,-1))
goal = (-1,-1,-1,-1,0,0,0,1,1,1,1)
def isgoal( self ):
return self.pos[1] == self.goal
def __iter__( self ):
(m,b) = self.pos
for i in range(len(b)):
if b[i] != m: continue
if 0<=i+m+m<len(b) and b[i+m] == 0:
newmove = list(b)
newmove[i] = 0
newmove[i+m] = m
yield MarblePuzzle((-m,tuple(newmove)))
elif 0<=i+m+m<len(b) and b[i+m]==-m and b[i+m+m]==0:
newmove = list(b)
newmove[i] = 0
newmove[i+m+m] = m
yield MarblePuzzle((-m,tuple(newmove)))
93. (Continued)
def __iter__( self ):
dsone = self.pos.find('0')
dstwo = self.pos.find('0',dsone+1)
for dest in [dsone, dstwo]:
for adj in [-4,-1,1,4]:
if (dest,adj) in self.block: continue
piece = self.pos[dest+adj]
if piece == '0': continue
newmove = self.pos.replace(piece, '0')
for i in range(20):
if 0 <= i+adj < 20 and self.pos[i+adj]==piece:
newmove = newmove[:i] + piece + newmove[i+1:]
if newmove.count('0') != 2: continue
yield PaPuzzle(newmove)
94. What have we accomplished?
• A 36 line generic puzzle solving framework
• Easy adapted to a huge variety of puzzles
• The fun part is writing the move generator
• Everything else is trivial (initial position,
goal state, repr function)
95. Optimizing
• Fold symmetries into a single canonical
states (don’t explore mirror image
solutions)
• Use collections.deque() instead of a list()
• Decide between depth-first and breadth-
first solutions. (first encountered vs shortest
solution)