WebLEAP/DSR is a new implementation of the WebLEAP tool that helps English language learners learn usage by analyzing web corpora. It allows users to input sentences and see frequency graphs and keyword-in-context examples from search engines. The tool also allows domain specification to focus analysis. Examples show how it can help estimate appropriate prepositions and compare differences between UK and US English. The system records user interactions for computer-assisted language learning. Further research topics include improving precision and analyzing regional differences and collaborative writing.
IRSim implements an approach to establish traceability links among artifacts such as requirements, source code, and test cases. This presentation shows how we used IRSim on NASA software to establish traceability links for sofware analysis, program understanding, and quality improvement, etc.
The Basic Over of Swift as a new programming language.
This presentation is general look at Swift, please disregard the fact and references to the Swift scripting language which at the time thought to similar or same.
IRSim implements an approach to establish traceability links among artifacts such as requirements, source code, and test cases. This presentation shows how we used IRSim on NASA software to establish traceability links for sofware analysis, program understanding, and quality improvement, etc.
The Basic Over of Swift as a new programming language.
This presentation is general look at Swift, please disregard the fact and references to the Swift scripting language which at the time thought to similar or same.
インターネット上のWikiページにより、センサネットワークのセンサ端末群を制御するIoTシステムの試作について述べる。センサ端末群から得られるデータを管理者が観察している最中に、特定のセンサ端末のみ、データを取得する時間間隔を途中で変えたくなる場合がある。また、センサの出力が、ある値を超えた時だけ、インターネット側のサーバにデータを出力している場合、その値を変えて調整したくなる場合がある。本システムは、このような要求を、物理的に、センサ端末がある場所に行かなくても、世界中、どこからでも、そのセンサ端末群を制御しているWikiページの記述を変えるだけで、実現しようとするものである。
An experimental implementation of an IoT system is shown. Sensor terminals of the IoT system are controlled by a Wiki page on the Internet. When the manager of an IoT system is observing data from sensor terminals of the IoT system, the manager often wants to change the interval of data acquisition term of specific sensor terminals of the IoT system. When a sensor terminal is configured to report its sensor value when the value goes over or goes under a designated value, the manager often wants to change the designated value. This IoT system realizes such needs by enabling sensor terminals can be controlled by a Wiki page, instead of control sensor terminals by going to the place of the terminals.
Bot と Wiki を使った試験的な並列プログラミング環境およびプログラム例を示す。情報セキュリティ担当者が頭を悩ませていた悪性Botの耐障害性と超並列性を、科学技術計算や一般的な計算を行うために有益な方向に利用することを目指す。例として動的計画法を用いて最小経路問題を解く並列プログラムを示す。ここで、必要な計算資源(BotとWebページの数)はノード数に比例し、最小経路を計算するのに必要な時間は、求まる最小経路の弧の数に比例する。
Wiki に書いたスクリプトに従って, 自動的に Twitter で tweet する情報提示システムと, その開発と利用例について述べる. ここで, 実際にtweet するホストと スクリプトが書かれている Wiki のサーバのホストとは独立している. このシステムを使って着る電光掲示板を拡張し, それを使って, フルマラソンのスタートからゴールまで, 周りの参加者や沿道の観衆にメッセージを送り続けることができた. このシステムを拡張し, インターネット上の応用システムの障害対策や, ハードウェアやソフトウェアの更新等によるダウンタイムの低減に役立てることについても検討を行う.
A system which tweets messages automatically, is shown. The system is a kind of bot networks, bots of which are controlled by commands on a wiki page, according to the script in the wiki page. We have constructed a wearable LED matrix sign which tweets automatically, using this system, and we have applied this to a public relations in a full marathon race. We also consider to use the structure of this system to enhance the resilience of application systems.
A Wearable LED Matrix Sign System@ACM SIGUCCS2015Takashi Yamanoue
A Wearable LED Matrix Sign System Which Shows a Tweet of Twitter and Its Application to Campus Guiding and Emergency Evacuation @ ACM SIGUCCS 2015, Lightning talks.
A Technique to Assign an Appropriate Server to a Client, for a CDN Consists ...Takashi Yamanoue
This paper discusses a technique to assign an appropriate server to a client for a content delivery network (CDN). We assume that the CDN consists of not only servers in the global Internet but also servers in hierarchical private networks. To use a common web browser as the client, this technique does not use broadcasting or multicasting. When a client is placed in a private network and a server of the CDN is also placed in the same private network, the client is connected to the server automatically by using this technique. When a client is placed in a private network and no CDN server is in the private network, or when the client is placed in the global network, the client is connected to a server in the global network automatically. This technique could improve the bandwidth between a server and a client when they are placed in the same private network because the TCP bandwidth heavily depends on latency. The CDN user does not need to know the location of a server. This technique does not use DNS because a CDN server in a private network is not always registered in the DNS.
Portable Cloud Computing System – A System which Makes Everywhere an ICT Enh...Takashi Yamanoue
A "Portable Cloud Computing System (Portable Cloud)" is discussed. This system is a portable system that can turn any room into an ICT-enhanced classroom or an ICT-enhanced meeting-room. The Portable Cloud is a carrying case, which contains Wi-Fi access points, a network switch, and a server cluster. The server cluster includes a NAPT (Network Address Port Translation) router, a DHCP server, a captive portal, and application servers. The Wi-Fi access points, the NAPT router, the captive portal and the DHCP server make the space where the Portable Cloud is located, Internet accessible. The application servers contains applications such like "Distributed Web Screen Share (DWSS)", "Slide Plus", and "OwnCloud". The DWSS is a web application which transmits a live screen image of a PC to a large number of Web clients. Slide Plus is an interactive live slide presentation tool for a large audience with Web clients. OwnCloud is open source software by owncloud.com. This software enables file sharing among students and teachers similar to that found in Dropbox. We are using the Portable Cloud for our seminar class, meetings of grass-root groups, and academic conferences. We can't imagine holding our seminar class without the Portable Cloud.
インターネット上のWikiページにより、センサネットワークのセンサ端末群を制御するIoTシステムの試作について述べる。センサ端末群から得られるデータを管理者が観察している最中に、特定のセンサ端末のみ、データを取得する時間間隔を途中で変えたくなる場合がある。また、センサの出力が、ある値を超えた時だけ、インターネット側のサーバにデータを出力している場合、その値を変えて調整したくなる場合がある。本システムは、このような要求を、物理的に、センサ端末がある場所に行かなくても、世界中、どこからでも、そのセンサ端末群を制御しているWikiページの記述を変えるだけで、実現しようとするものである。
An experimental implementation of an IoT system is shown. Sensor terminals of the IoT system are controlled by a Wiki page on the Internet. When the manager of an IoT system is observing data from sensor terminals of the IoT system, the manager often wants to change the interval of data acquisition term of specific sensor terminals of the IoT system. When a sensor terminal is configured to report its sensor value when the value goes over or goes under a designated value, the manager often wants to change the designated value. This IoT system realizes such needs by enabling sensor terminals can be controlled by a Wiki page, instead of control sensor terminals by going to the place of the terminals.
Bot と Wiki を使った試験的な並列プログラミング環境およびプログラム例を示す。情報セキュリティ担当者が頭を悩ませていた悪性Botの耐障害性と超並列性を、科学技術計算や一般的な計算を行うために有益な方向に利用することを目指す。例として動的計画法を用いて最小経路問題を解く並列プログラムを示す。ここで、必要な計算資源(BotとWebページの数)はノード数に比例し、最小経路を計算するのに必要な時間は、求まる最小経路の弧の数に比例する。
Wiki に書いたスクリプトに従って, 自動的に Twitter で tweet する情報提示システムと, その開発と利用例について述べる. ここで, 実際にtweet するホストと スクリプトが書かれている Wiki のサーバのホストとは独立している. このシステムを使って着る電光掲示板を拡張し, それを使って, フルマラソンのスタートからゴールまで, 周りの参加者や沿道の観衆にメッセージを送り続けることができた. このシステムを拡張し, インターネット上の応用システムの障害対策や, ハードウェアやソフトウェアの更新等によるダウンタイムの低減に役立てることについても検討を行う.
A system which tweets messages automatically, is shown. The system is a kind of bot networks, bots of which are controlled by commands on a wiki page, according to the script in the wiki page. We have constructed a wearable LED matrix sign which tweets automatically, using this system, and we have applied this to a public relations in a full marathon race. We also consider to use the structure of this system to enhance the resilience of application systems.
A Wearable LED Matrix Sign System@ACM SIGUCCS2015Takashi Yamanoue
A Wearable LED Matrix Sign System Which Shows a Tweet of Twitter and Its Application to Campus Guiding and Emergency Evacuation @ ACM SIGUCCS 2015, Lightning talks.
A Technique to Assign an Appropriate Server to a Client, for a CDN Consists ...Takashi Yamanoue
This paper discusses a technique to assign an appropriate server to a client for a content delivery network (CDN). We assume that the CDN consists of not only servers in the global Internet but also servers in hierarchical private networks. To use a common web browser as the client, this technique does not use broadcasting or multicasting. When a client is placed in a private network and a server of the CDN is also placed in the same private network, the client is connected to the server automatically by using this technique. When a client is placed in a private network and no CDN server is in the private network, or when the client is placed in the global network, the client is connected to a server in the global network automatically. This technique could improve the bandwidth between a server and a client when they are placed in the same private network because the TCP bandwidth heavily depends on latency. The CDN user does not need to know the location of a server. This technique does not use DNS because a CDN server in a private network is not always registered in the DNS.
Portable Cloud Computing System – A System which Makes Everywhere an ICT Enh...Takashi Yamanoue
A "Portable Cloud Computing System (Portable Cloud)" is discussed. This system is a portable system that can turn any room into an ICT-enhanced classroom or an ICT-enhanced meeting-room. The Portable Cloud is a carrying case, which contains Wi-Fi access points, a network switch, and a server cluster. The server cluster includes a NAPT (Network Address Port Translation) router, a DHCP server, a captive portal, and application servers. The Wi-Fi access points, the NAPT router, the captive portal and the DHCP server make the space where the Portable Cloud is located, Internet accessible. The application servers contains applications such like "Distributed Web Screen Share (DWSS)", "Slide Plus", and "OwnCloud". The DWSS is a web application which transmits a live screen image of a PC to a large number of Web clients. Slide Plus is an interactive live slide presentation tool for a large audience with Web clients. OwnCloud is open source software by owncloud.com. This software enables file sharing among students and teachers similar to that found in Dropbox. We are using the Portable Cloud for our seminar class, meetings of grass-root groups, and academic conferences. We can't imagine holding our seminar class without the Portable Cloud.
The course aims to provide you with an understanding of the fundamental concepts involved in object-oriented programming (object, class, protocol, hierarchy, inheritance, encapsulation, polymorphism and collaboration).
The programming language you will use is Java. However, the purpose of the course is not to teach you the minutiae of the Java language, but rather to teach you fundamental object-oriented programming concepts and skills that will be transferable to any object¬ oriented language
Envisioning the Future of Language WorkbenchesMarkus Voelter
Over the last couple of years, I have used MPS successfully to build interesting (modeling and programming) languages in a wide variety of domains, targeting both business users and engineers. I’ve used MPS because it is currently the most powerful language workbench, lots of things are good about iz, in particular, its support for a multitude of notations and language modularity. But it is also obvious that MPS is not going to be viable for the medium to long term future; the most obvious reason for this statement is that it is not web/cloud-based. In this keynote, I will quickly recap why and how we have been successful with MPS, and point out how language workbenches could look like in the future; I will outline challenges, opportunities and research problems. I hope to spawn discussions for the remainder of the workshop.
The Nuxeo Way: leveraging open source to build a world-class ECM platformNuxeo
How can one create and deliver enterprise-class software, worth tens of years of R&D, with minimal capital investment? Open source can help, as well as the right context and ecosystem. This first talk will highlight the experience gained in the 8 first years of Nuxeo, and how they were applied to the latest iteration of the Nuxeo Platform.
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConcyasuimao
This is a slightly modified version of the slides presented at AACL 2018, Atlanta, Georgia.
All the graphs on the slides are created by CasualConc using R.
Managing Inquiry-based Learning: Learning from experiencecilass.slideshare
We have taught a suite of inquiry-based learning modules for the past 20 years. Two problems that have occurred frequently are that the students can be poor at organising their schedules and setting deadlines, whilst at the same time we have moved towards marking schemes which are focused on process applied rather than product produced. These two factors have mandated that the students need to provide evidence that they are planning and following the process that has been set. To support this we have introduced a suite of custom support software.
Stream SQL eventflow visual programming for real programmers presentationstreambase
Richard Tibbetts, CTO, StreamBase Systems.
StreamSQL EventFlow is one of the most popular languages for Complex Event Processing (CEP), a data management paradigm for real-time applications. Based on a stream-relational data model common to other CEP languages, EventFlow is unique in that it is a visual language. This talk will focus on the design of visual representations for key features including event dispatch, modularity, data parallelism, polymorphism, and dependency injection, and on the co-development of an Eclipse-based IDE along with a new programming language. StreamSQL EventFlow is the primary programming language for the StreamBase Event Processing Platform.
Complex Event Processing platforms are used to process large volumes of event-oriented data in real-time, often in latency-critical applications such as securities trading. Combining clustering, messaging, queuing, data storage, and application logic into one system minimizes latency and gives the programmer control over all aspects of the application.
StreamSQL EventFlow is an executable visual language for building CEP applications, unlike visual environments designed for non-developers, or architecture-focused modeling tools. The talk will cover experiences overcoming prejudice against visual programming languages, and how critical development tools are to that process. We will also discuss some details of the implementation including the compiler, a visual debugger, and diff/merge functionality.
Similar to Learning Usage of English KWICly with WebLEAP/DSR (20)
Bot Computing using the Power of Wiki CollaborationTakashi Yamanoue
Bot computing using the power of Wiki collaboration and an experimental implementation of the bot running environment are discussed. While botnets are usually created for malicious purposes, the bot computing in this study aims to use bots for beneficial purposes. The massively parallel and persistence features of a botnet can enhance its computing power and high availability for beneficial computing. Bot computing can also enhance people’s collaboration by introducing dynamic Web pages to previously static Wiki networks. Parallel dynamic programming for solving a minimal path problem is shown as an example. Resources such as the number of bots and the number of web pages were proportional to the number of nodes, and the time to solve the minimal path problem was proportional to the number of arcs of the minimal path.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
1. Learning Usage of English KWICly
with WebLEAP/DSR
Takashi Yamanoue
Kagoshima University, Japan
Toshiro Minami
Kyushu Institute of Information Sciences &
Kyushu University, Japan
Ian Ruxton
Kyushu Institute of Technology, Japan
Wataru Sakurai
University of Tsukuba, Japan
ICITA2004@Harbin(04.01.08-11)
2. Contents
I. Motivation (Introduction)
II. WebLEAP/DSR: A New
implementation of WebLEAP
III. Examples and Experiments for
Evaluation
IV. Related Work
V. Concluding Remarks
3. Difficulties in writing in English
Is it really used?
?
?
•Spelling
→ spell checker
•Grammar
→ grammar checker
•Usage
→ Corpus Linguistic Tools
I. Motivation
4. Problems of Ordinary
Corpus Linguistic Tools
Time-consuming
Needs hard work in order to make a
good corpus
Copyright problem
(often) Outdated from the beginning
Tools are mainly for experts
Difficult to use for ordinary learners
5. A Solution
Use of “Web-Corpus”
= Using Web Documents as a Corpus
Maintenance free: Exists as it is
Always new, reflects current status of
languages
A lot of applications/services are available on
the Internet
6. WebLEAP
Shows
Frequencies of phrases in the given
sentence graphically
Using a search engine.
New Features
KWIC (Key Word In Context)
Domain Specification
8. II. WebLEAP/DSR: A New
implementation of WebLEAP
DSR: Distributed System Recorder
A Computer Assisted Teaching System,…
Recording and replaying every operation.
Draw, Programming, Web, WebLEAP, …
WebLEAP Basic:
Frequencies Graphically
New(using Google Web APIs):
KWIC
Domain Specification
19. Satoh’s system … webcorpus
SUIKO…detects wrong sentences
Applications using Google Web APIs
DSR: Distributed System Recorder
A Benchmarking tool for distributed
systems.
A Computer Assisted Teaching system
P2P, reliable multicast, …
IV. Related Work
20. WebLEAP:
A tool for helping with writing.
Popularities of expressions.
Frequencies from a Search engine.
KWIC
How the expression is used.
Filling the lacking word.
Domain specification
WebLEAP/DSR
An application of DSR.
V. Concluding Remarks
21. Precision
Discrimination of Native speakers to
non native speakers.
Differences from region to region
Collaborative Writing
Further Research
Topics
(Thank you, Mr. Chairman)
I’m Takashi Yamanoue from Kagoshima University, Japan.
I would like to talk about “Learning Usage of English KWICly with WebLEAP/DSR”.
This talk consists of, the introduction, WebLEAP/DSR, a new implementation of WebLEAP,
Examples and Experiments for evaluation, related work and concluding remarks.
It is hard work to write something. It is even harder when it is in a second language. We often cannot judge the appropriateness of sentences. We already have spell checkers and grammar checkers. However, it could happen that an expression is correct grammatically, but no native speakers actually use it. A corpus and a concordance program helps us in such cases.
A corpus is a large number of sample sentences.
Making a corpus is time-consuming.
Hard work is needed in order to make a good corpus,
and to solve copyright problems.
The corpus is often outdated from the beginning.
Concordancers, tools for using the corpus, are mainly for experts.
Many of them are difficult to use for ordinary learners.
In order to solve these problems, we use the web documents as a corpus.
We call this kind of corpus a ‘Web-corpus’.
The Web-corpus is maintenance free. It exists as it is.
It is always new, and it reflects current status of languages.
A lot of applications/services are available on the Internet.
WebLEAP is a program which shows frequencies of the phrases in the given sentence to the user graphically.
We added new two feature to the WebLEAP.
One is KWIC, Key Word In Context. Another is Domain specification.
This figure shows the inside of the WebLEAP.
The sentence which is given by the user is decomposed into phrases by this word sequence generator.
These phrase are sent to a search engine.
The search engine return the corresponding pages which include frequency of the phrase.
The frequency is extracted by the document analyzer.
These frequencies are shown to the user graphically by the user interface.
WebLEAP/DSR is a new implementation of the WebLEAP.
DSR is a distributed system recorder. It can be used as a computer assisted teaching system and a benchmark test tool for distributed systems.
It can record and replay of users’ operation of DSR’s application programs on a distributed system.
WebLEAP/DSR is an application program of DSR.
By using Google Web API, a web service of the google,
It can show a KWIC table of a phrase. It can specify domain of the source sentences.
This is the WebLEAP Window of the WebLEAP/DSR.
It is used to input the sentence and settings, and control the outputs.
Clicking this [eval] button after inputting a sentence in this field, The draw window will shown.
This is the draw window. This window shows the frequencies of phrases in the input sentence graphically.
A number in the colored bar shows the frequency of the phrase over the bar.
A pink bar shows a low frequency. A blue bar shows a high frequency.
When the user clicks a bar, for example this bar, the KWIC window is shown.
This is the KWIC window. This window shows the KWIC table.
In these fields, the keyword which corresponds to the clicked bar at the draw window
is shown in bold letters.
We can see how the keyword is used in the context.
When the user clicks a URL field, for example this field, the web browser window is shown.
This is the web browser window.
The page in this window includes the keyword and shows the context like this.
This is the setting window. This page is shown when the user clicks the setting button in the Webleap window.
We can select a search engine that is used in the evaluation together with setting search options of the search engine.
In this figure, we have selected google as the search engine and are going to set the
Search domain as a search option for google.
We have experimented with a variety of cases.
Let's have a look at two of them.
One is estimating the appropriate preposition. Another is comparing English in specific countries.
Let’s think about which preposition is the most appropriate for “your own risk”.
Is it by? With? At?
This figure shows the frequencies of “by your own risk”, “with your own risk” and “at your own risk”.
The frequencies are 41, 138 and 434000. It is easy to see that “at your own risk” is the most appropriate one.
Let’s think about when we couldn’t have the “at” in our mind at first.
This figure shows that frequencies of “by your own risk” or “with your own risk” is too small for the frequency
of “your own risk”. Then click the frequency bar which corresponds to the “your own risk”.
Then this KWIC window is shown. This KWIC table shows how the “your own risk” is used in each context.
In this table, “at” is used in the most cases.
Then we can ask the frequency for “at your own risk” and we can confirm that “at your own risk” is the most appropriate expression.
Non native English speakers are sometimes confused when she or he is writing a sentence in a specific English dialect such as British English or American English.
The WebLEAP/DSR has the ability to filter the Web corpus by a domain name in the page’s URL.
This figure shows WebLEAP outputs for comparing two sentences “living in a flat” and “living in an apartment “ in the UK domain and the US domain.
This figure shows that “living in a flat” is used much more than “living in an apartment” in the UK domain,
And “living in an apartment” is used much more than “living in a flat” in the US domain.
Satoh’s system is similar to our system in the sense that it also uses Web documents through a search engine. This system outputs the KWIC index of a keyword, whereas our system outputs not only KWIC but also a graphical representation of the frequencies of words or phrases. WebLEAP can also specify the domain of the web-corpus.
SUIKO detects wrong sentences of Japanese, It doesn’t show if an expression is really used or not.
There are other applications wich use the google web apis. Most of them provide only an interface of the search engine.
DSR is a distributed system recorder. It can be used as a benchmarking tool and computer assisted teaching systems.
WebLEAP is a tool for helping with writing, by showing the user popularities of expressions. This uses a search engine in order to get frequencies of the expressions.
We added two new freatures to the WebLEAP.
One is KWIC and another is domain specification.
By using KWIC, the user can see how the given expression is used. The user can also fill the lacking word using KWIC.
WebLEAP/DSR is an application of the DSR.
In the next step of this research, we would like to improve the precision of the
WebLEAP and to make the WebLEAP to support collaborative writing.
We’d like to discriminate the native speaker’s expressions to the non native speaker’s.
We’d like to know the differences of sentences more precisely from region to region.
We thank to google.com for putting the google web apis in public and letting us use them.