This document discusses various natural language processing techniques in Python, including summarizing and extracting text from PDFs, Word documents, and web pages. It also covers generating n-grams, extracting noun phrases, calculating text similarity, phonetic matching, part-of-speech tagging, named entity recognition, sentiment analysis, word sense disambiguation, speech recognition, text-to-speech, and voice translation. A variety of Python libraries are used such as NLTK, TextBlob, BeautifulSoup, and gTTS. Example code is provided for scraping movie data from IMDB and analyzing it.
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
One of the main selling points of PHP 7 is greatly improved performance, with many real-world applications now running twice as fast… But where do these improvements come from?
At the core of PHP 7 lies an engine rewrite with focus on improving memory usage and performance. This talk provides an overview of the most significant changes, briefly covering everything from data structure changes, over enhancements in the executor, to the new compiler implementation.
A sprint thru Python's Natural Language ToolKit, presented at SFPython on 9/14/2011. Covers tokenization, part of speech tagging, chunking & NER, text classification, and training text classifiers with nltk-trainer.
As a result of an engine rewrite with focus on more efficient data structures, PHP 7 offers much improved performance and memory usage. This session describes important aspects of the new implementation and how it compares to PHP 5. A particular focus will be on the representation of values, arrays and objects.
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
One of the main selling points of PHP 7 is greatly improved performance, with many real-world applications now running twice as fast… But where do these improvements come from?
At the core of PHP 7 lies an engine rewrite with focus on improving memory usage and performance. This talk provides an overview of the most significant changes, briefly covering everything from data structure changes, over enhancements in the executor, to the new compiler implementation.
A sprint thru Python's Natural Language ToolKit, presented at SFPython on 9/14/2011. Covers tokenization, part of speech tagging, chunking & NER, text classification, and training text classifiers with nltk-trainer.
As a result of an engine rewrite with focus on more efficient data structures, PHP 7 offers much improved performance and memory usage. This session describes important aspects of the new implementation and how it compares to PHP 5. A particular focus will be on the representation of values, arrays and objects.
PHP 7 – What changed internally? (PHP Barcelona 2015)Nikita Popov
One of the main selling points of PHP 7 is greatly improved performance, with many real-world applications now running twice as fast… But where do these improvements come from? At the core of PHP 7 lies an engine rewrite with focus on improving memory usage and performance. This talk provides an overview of the most significant changes, briefly covering everything from data structure changes, over enhancements in the executor, to the new compiler implementation.
Slides from my GeeCON 2014 Prague talk:
"Groovy is a dynamic language that provides different types of metaprogramming techniques. In this talk we’ll mainly see runtime metaprogramming. I’ll explain Groovy Meta-Object-Protocol (MOP), the metaclass, how to intercept method calls, how to deal with method missing and property missing, the use of mixins, traits and categories. All of these topics will be explained with examples in order to understand them.
Also, I’ll talk a little bit about compile-time metaprogramming with AST Transformations. AST Transformations provide a wonderful way of manipulating code at compile time via modifications of the Abstract Syntax Tree. We’ll see a basic but powerful example of what we can do with AST transformations."
The code is available at: https://github.com/lmivan/geecon2014-prague-metaprograming-with-groovy
Many developers will be familiar with lex, flex, yacc, bison, ANTLR, and other related tools to generate parsers for use inside their own code. For recognizing computer-friendly languages, however, context-free grammars and their parser-generators leave a few things to be desired. This is about how the seemingly simple prospect of parsing some text turned into a new parser toolkit for Erlang, and why functional programming makes parsing fun and awesome
In which Richard will tell you about some things you should never (probably ever) do to or in Python. Warranties may be voided. The recording of this talk is online at http://www.youtube.com/watch?v=H2yfXnUb1S4
Topological indices (t is) of the graphs to seek qsar models of proteins com...Jitendra Kumar Gupta
Currently, there is an increasing necessity for quick computational chemistry methods to predict proteins properties very accurately. This is facilitated by the improvements in various bioinformatics techniques as well as high computational power available these days. Hence quick and fast running techniques are being developed for analysing many macromolecules computationally.
In this sense, quantitative structure activity relationship (QSAR) is a widely covered field, with more than 1600 molecular descriptors introduced up to now Most of the molecular descriptors have been applied to small molecules.
Nevertheless, the QSAR studies for DNA and protein sequences may be classified as an emerging field. One of the most promising applications of QSAR to proteins relates to the prediction of thermal stability, which is an essential issue in protein science.
Connectivity indices, also called topological indices (TIs) serve fast calculations. TIs are graph invariants of different kinds of proteins.
The interest in TIs has exploded because we can use them to describe also macromolecular and macroscopic systems represented by complex networks of interactions (links) between the different parts of a system (nodes) such as: drug-target, protein-protein, metabolic, host-parasite, brain cortex, parasite disease spreading, internet, or social networks. Here, we use TI’s to analyze protein-protein complexes.
Odessapy2013 - Graph databases and PythonMax Klymyshyn
Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews)
This is local meme - when someone asking question and you will look stupid in case you don't have answer.
Sometimes we want to keep or collect errors for later examination. We’ll use Cats to help us pick the story we want to tell when handling errors; accumulated errors or first error wins. Monads will be included!
Using the following code Install Packages pip install .pdfpicscamshoppe
Using the following code:
##Install Packages
!pip install tensorflow
!pip install matplotlib
!pip install numpy
!pip install pandas
##Import Statements
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
##Bringing in our dataset
url= 'https://raw.githubusercontent.com/BeeDrHU/Introduction-to-Python-CSPC-323-
/main/sales_forecast.csv'
data = pd.read_csv(url, sep=',')
##Filtering and Cleaning
data= data[['Store', 'Date', 'Temperature', 'Fuel_Price', 'CPI' , 'Unemployment', 'Weekly_Sales',
'IsHoliday_y']]
data['Date'] = pd.to_datetime(data['Date'], format='%d/%m/%Y')
data = data.set_index('Date')
data = data.sort_index()
##Checklist and Quality Assurance
data.isnull()
print(f'Number of rows with missing values: {data.isnull().any(axis=1).mean()}')
data.info()
##Subsetting variable to predict
df= data['Weekly_Sales']
df.plot()
##Train and Test Split
start_train = datetime(2010, 2, 5)
end_train = datetime(2011, 12, 30)
end_test = datetime(2012, 7, 13)
msk_train = (data.index >= start_train) & (data.index <= end_train)
msk_test = (data.index >= end_train) & (data.index <= end_test)
df_train = df.loc[msk_train]
df_test = df.loc[msk_test]
df_train.plot()
df_test.plot()
##Normalizing our data
uni_data= df.values.astype(float)
df_train= int(len(df_train))
uni_train_mean= uni_data[:df_train].mean()
uni_train_std= uni_data[:df_train].std()
uni_data= (uni_data-uni_train_mean)/uni_train_std
##Build the features dataset to make the model multivariate
features_considered = ['Temperature', 'Fuel_Price', 'CPI']
features = data[features_considered]
features.index = data.index
##Standardizing the Data
dataset = features.values
data_mean = dataset[:df_train].mean(axis=0)
data_std = dataset[:df_train].std(axis=0)
dataset = (dataset-data_mean)/data_std
##Splitting data into training and testing
x_train = dataset[msk_train]
y_train = uni_data[:df_train]
x_test = dataset[msk_test]
y_test = uni_data[df_train:]
##Defining the model architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=[x_train.shape[1]]),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
##Compiling the model
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
loss='mae')
##Fitting the model to the training data
history = model.fit(x_train, y_train,
epochs=100,
batch_size=64,
validation_split=0.2,
verbose=0)
##Evaluating the model on the test data
results = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {results}')
##Making predictions on new data
predictions = model.predict(x_test)
##Plotting the results
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(np.arange(len(y_test)), y_test, label='Actual')
ax.plot(np.arange(len(predictions)), predictions, label='Predicted')
ax.legend()
plt.title('Actual vs Predicted Weekly Sales')
plt.xlabel('Week')
plt.ylabel('Normalized Sales')
plt.show()
*BUILD the RNN*
##Defining Function to Build.
PHP 7 – What changed internally? (PHP Barcelona 2015)Nikita Popov
One of the main selling points of PHP 7 is greatly improved performance, with many real-world applications now running twice as fast… But where do these improvements come from? At the core of PHP 7 lies an engine rewrite with focus on improving memory usage and performance. This talk provides an overview of the most significant changes, briefly covering everything from data structure changes, over enhancements in the executor, to the new compiler implementation.
Slides from my GeeCON 2014 Prague talk:
"Groovy is a dynamic language that provides different types of metaprogramming techniques. In this talk we’ll mainly see runtime metaprogramming. I’ll explain Groovy Meta-Object-Protocol (MOP), the metaclass, how to intercept method calls, how to deal with method missing and property missing, the use of mixins, traits and categories. All of these topics will be explained with examples in order to understand them.
Also, I’ll talk a little bit about compile-time metaprogramming with AST Transformations. AST Transformations provide a wonderful way of manipulating code at compile time via modifications of the Abstract Syntax Tree. We’ll see a basic but powerful example of what we can do with AST transformations."
The code is available at: https://github.com/lmivan/geecon2014-prague-metaprograming-with-groovy
Many developers will be familiar with lex, flex, yacc, bison, ANTLR, and other related tools to generate parsers for use inside their own code. For recognizing computer-friendly languages, however, context-free grammars and their parser-generators leave a few things to be desired. This is about how the seemingly simple prospect of parsing some text turned into a new parser toolkit for Erlang, and why functional programming makes parsing fun and awesome
In which Richard will tell you about some things you should never (probably ever) do to or in Python. Warranties may be voided. The recording of this talk is online at http://www.youtube.com/watch?v=H2yfXnUb1S4
Topological indices (t is) of the graphs to seek qsar models of proteins com...Jitendra Kumar Gupta
Currently, there is an increasing necessity for quick computational chemistry methods to predict proteins properties very accurately. This is facilitated by the improvements in various bioinformatics techniques as well as high computational power available these days. Hence quick and fast running techniques are being developed for analysing many macromolecules computationally.
In this sense, quantitative structure activity relationship (QSAR) is a widely covered field, with more than 1600 molecular descriptors introduced up to now Most of the molecular descriptors have been applied to small molecules.
Nevertheless, the QSAR studies for DNA and protein sequences may be classified as an emerging field. One of the most promising applications of QSAR to proteins relates to the prediction of thermal stability, which is an essential issue in protein science.
Connectivity indices, also called topological indices (TIs) serve fast calculations. TIs are graph invariants of different kinds of proteins.
The interest in TIs has exploded because we can use them to describe also macromolecular and macroscopic systems represented by complex networks of interactions (links) between the different parts of a system (nodes) such as: drug-target, protein-protein, metabolic, host-parasite, brain cortex, parasite disease spreading, internet, or social networks. Here, we use TI’s to analyze protein-protein complexes.
Odessapy2013 - Graph databases and PythonMax Klymyshyn
Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews)
This is local meme - when someone asking question and you will look stupid in case you don't have answer.
Sometimes we want to keep or collect errors for later examination. We’ll use Cats to help us pick the story we want to tell when handling errors; accumulated errors or first error wins. Monads will be included!
Using the following code Install Packages pip install .pdfpicscamshoppe
Using the following code:
##Install Packages
!pip install tensorflow
!pip install matplotlib
!pip install numpy
!pip install pandas
##Import Statements
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
##Bringing in our dataset
url= 'https://raw.githubusercontent.com/BeeDrHU/Introduction-to-Python-CSPC-323-
/main/sales_forecast.csv'
data = pd.read_csv(url, sep=',')
##Filtering and Cleaning
data= data[['Store', 'Date', 'Temperature', 'Fuel_Price', 'CPI' , 'Unemployment', 'Weekly_Sales',
'IsHoliday_y']]
data['Date'] = pd.to_datetime(data['Date'], format='%d/%m/%Y')
data = data.set_index('Date')
data = data.sort_index()
##Checklist and Quality Assurance
data.isnull()
print(f'Number of rows with missing values: {data.isnull().any(axis=1).mean()}')
data.info()
##Subsetting variable to predict
df= data['Weekly_Sales']
df.plot()
##Train and Test Split
start_train = datetime(2010, 2, 5)
end_train = datetime(2011, 12, 30)
end_test = datetime(2012, 7, 13)
msk_train = (data.index >= start_train) & (data.index <= end_train)
msk_test = (data.index >= end_train) & (data.index <= end_test)
df_train = df.loc[msk_train]
df_test = df.loc[msk_test]
df_train.plot()
df_test.plot()
##Normalizing our data
uni_data= df.values.astype(float)
df_train= int(len(df_train))
uni_train_mean= uni_data[:df_train].mean()
uni_train_std= uni_data[:df_train].std()
uni_data= (uni_data-uni_train_mean)/uni_train_std
##Build the features dataset to make the model multivariate
features_considered = ['Temperature', 'Fuel_Price', 'CPI']
features = data[features_considered]
features.index = data.index
##Standardizing the Data
dataset = features.values
data_mean = dataset[:df_train].mean(axis=0)
data_std = dataset[:df_train].std(axis=0)
dataset = (dataset-data_mean)/data_std
##Splitting data into training and testing
x_train = dataset[msk_train]
y_train = uni_data[:df_train]
x_test = dataset[msk_test]
y_test = uni_data[df_train:]
##Defining the model architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=[x_train.shape[1]]),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
##Compiling the model
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
loss='mae')
##Fitting the model to the training data
history = model.fit(x_train, y_train,
epochs=100,
batch_size=64,
validation_split=0.2,
verbose=0)
##Evaluating the model on the test data
results = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {results}')
##Making predictions on new data
predictions = model.predict(x_test)
##Plotting the results
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(np.arange(len(y_test)), y_test, label='Actual')
ax.plot(np.arange(len(predictions)), predictions, label='Predicted')
ax.legend()
plt.title('Actual vs Predicted Weekly Sales')
plt.xlabel('Week')
plt.ylabel('Normalized Sales')
plt.show()
*BUILD the RNN*
##Defining Function to Build.
Http4s, Doobie and Circe: The Functional Web StackGaryCoady
Http4s, Doobie and Circe together form a nice platform for building web services. This presentations provides an introduction to using them to build your own service.
Why we are submitting this talk? Because Go is cool and we would like to hear more about this language ;-). In this talk we would like to tell you about our experience with development of microservices with Go. Go enables devs to create readable, fast and concise code, this - beyond any doubt is important. Apart from this we would like to leverage our test driven habbits to create bulletproof software. We will also explore other aspects important for adoption of a new language.
This talk was part tongue in cheek, part serious, but entirely fun and given twice as a lightning talk - once at Europython & once at the ACCU python uk 05. It presents a generic python like language parser which does actually work. Think of it as an alternative to brackets in Lisp!
The WiMAX (IEEE 802.16e) standard offers peak data rates of 128Mbps downlink and
56Mbps uplink over 20MHz wide channels whilst the new standard in development, 4G
WiMAN-Advanced (802.16m) is targeting the requirements to be fully 4G using 64Q QAM,
BPSK and MIMO technologies to reach the 1Gbps rate. It is predicted that in an actual
deployment, using 4X2 MIMO in an urban microcell application using a 20 MHz TDD
channel, the 4G WiMAN-Advanced system will be able to support 120Mbps downlink and
60Mbps uplink per site concurrently. WiMAX applications are already in use in many countries
globally but research in 2010 gave results that showed only just over 350 set ups were actually
in use. Many previous WiMAX operators were found to have moved to LTE along with Yota,
who were the largest WiMAX operator in the world.
So I am writing a CS code for a project and I keep getting cannot .pdfezonesolutions
So I am writing a CS code for a project and I keep getting \"cannot open file\" I know I put it into
my code, but I don\'t know why it won\'t execute after i input ./a.out I put in gcc -Wall
contents3.c and no errors or warnings pop up. So I figure everything is working right. But after
that, I get \"Cannot open file\". So I am wondering what I can do to get this settled out properly. I
am putting the outcome of the project and my code. Please someone help me!
#include
#include
#include
#include
#include
#include
int len = 0;
int sp = 0;
int lwrcaseCount = 0;
int uprcaseCount = 0;
int digitCount = 0;
int specialCount = 0;
char data[200];
int frequency[84];
int sortFrequency[84][2];
char string[] =
\"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\\\"#$%&\'()*+,-
./:;<=>?@[/]^_`{|}~\";
//printf(\"\ \");
//To sort based on frequency of characters
/*void sort()
{
int max, loc, temp, x, y;
//Loops till end - 1 position of the frequency array
for(x = 0; x <= 84 - 1; x++)
{
//Initializes the max to the x position of the frequency array assuming it as maximum
max = frequency[x];
//Initializes of loc to x assuming location x contains the maximum value
loc = x;
//Loops through x + 1 to length - 1 of frequency array
for(y = x + 1; y <= 95 -1; y++)
{
//If current position of the frequency of the array is greater than the max
//then update the max with the current value of frequency and update the loc
if(frequency[y] > max)
{
max = frequency[y];
loc = y;
}//end of if
}//end of for loop
//If loc is not x then swap
if(loc != x)
{
temp = frequency[x];
frequency[x] = frequency[loc];
frequency[loc] = temp;
}//end of if
//Store the location of the maximum value in the zero column position of row x
sortFrequency[x][0]= loc;
//Store the maximum value in the first column position of row x
sortFrequency[x][1] = max;
}//end of for loop
}//end of function*/
void sort()
{
for(int i = 0; i < 84; i++)
{
sortFrequency[i][0] = i;
sortFrequency[i][1] = frequency[i];
}
for(int i = 0; i < 84-1; i++)
for(int j = 0; j < 84-i-1; j++)
if(sortFrequency[j][1] < sortFrequency[j+1][1])
{
int temp = sortFrequency[j][1];
sortFrequency[j][1] = sortFrequency[j+1][1];
sortFrequency[j+1][1] = temp;
temp = sortFrequency[j][0];
sortFrequency[j][0] = sortFrequency[j+1][0];
sortFrequency[j+1][0] = temp;
}
}
//To count frequency of each character
void frequencyCount()
{
int c, d;
//Loops till the length of the inputed data
for(c = 0; c < len; c++)
{
//Loops from 33 to 126 which is the ascii values of required characters range
for(d = 0; d < 84; d++)
{
//If the data in the current position is equal to the acii value
if(data[c] == string[d])
//Frequency 0 position is equal to 33 ascii.
//So 33 is deducted
frequency[d]++;
}//end of for loop
}//end of for loop
}//End of function
//Initializes the frequency array to zero
void initialize()
{
int c;
for(c = 0; c < 84; c++)
frequency[c] = 0;
}
//Read the file which contains data
void readFile()
{
//File pointer created
FILE *fptr;
char ch;
//Fi.
ITT 2015 - Saul Mora - Object Oriented Function ProgrammingIstanbul Tech Talks
Functional programming is finally a first class citizen in the Cocoa toolset! But, as you may have heard, Swift is not necessarily a pure functional language. And in embracing the functional paradigm, do you need to throw out your knowledge and experience with Object Oriented programming? Saul Mora shows that it turns out you can have your cake and eat it too!
Basic NLP concepts and ideas using Python and NLTK framework. Explore NLP prosessing features, compute PMI, see how Python/Nltk can simplify your NLP related task.
Everyone knows Python's basic datatypes and their most common containers (list, tuple, dict and set).
However, few people know that they should use a deque to implement a queue, that using defaultdict their code would be cleaner and that they could be a bit more efficient using namedtuples instead of creating new classes.
This talk will review the data structures of Python's "collections" module of the standard library (namedtuple, deque, Counter, defaultdict and OrderedDict) and we will also compare them with the built-in basic datatypes.
Similar to Procesamiento del lenguaje natural con python (20)
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
1. Inteligencia Artificial
Lenguaje Python: Procesamiento del Lenguaje Natural.
Universidad Nacional de Ingeniería
FACULTAD DE CIENCIAS Y SISTEMAS – DEPARTAMENTO DE INFORMÁTICA
2.
3. Procesamiento del Lenguaje Natural con Python.
Recopilación de datos de archivos PDF.
!pip install PyPDF2
import PyPDF2
from PyPDF2 import PdfFileReader
pdf = open("file.pdf","rb")
pdf_reader = PyPDF2.PdfFileReader(pdf)
print(pdf_reader.numPages)
page = pdf_reader.getPage(0)
print(page.extractText())
pdf.close()
Recopilación de datos de archivos Word.
!pip install docx
from docx import Document
doc = open("file.docx","rb")
document = docx.Document(doc)
docu=""
for para in document.paragraphs:
docu += para.text
print(docu)
Recopilación de datos de archivos HTML.
!pip install bs4
import urllib.request as urllib2
from bs4 import BeautifulSoup
response = urllib2.urlopen('https://en.wikipedia.org/wiki/Natural_language_processing')
html_doc = response.read()
soup = BeautifulSoup(html_doc, 'html.parser')
strhtm = soup.prettify()
print (strhtm[:1000])
print(soup.title)
print(soup.title.string)
print(soup.a.string)
print(soup.b.string)
4. # Extrayendo todas las instancias de una etiqueta particular
for x in soup.find_all('a'): print(x.string)
# Extrayendo todo el texto de una etiqueta particular
for x in soup.find_all('p'): print(x.text)
Arañando texto desde la web.
Antes de “escarbar o arañar” cualquier sitio web, blog o comercio electrónico sitios web, asegúrese
de leer los términos y condiciones del sitio web sobre si otorga permisos para escarbar sus datos.
!pip install bs4
!pip install requests
from bs4 import BeautifulSoup
import requests
import pandas as pd
from pandas import Series, DataFrame
from ipywidgets import FloatProgress
from time import sleep
from IPython.display import display
import re
import pickle
url = 'http://www.imdb.com/chart/top?ref_=nv_mv_250_6'
result = requests.get(url)
c = result.content
soup = BeautifulSoup(c,"lxml")'
summary = soup.find('div',{'class':'article'})
# Creando listas vacías para agregar los datos extraídos.
moviename = []
cast = []
description = []
rating = []
ratingoutof = []
year = []
genre = []
movielength = []
rot_audscore = []
rot_avgrating = []
rot_users = []
# Extrayendo los datos requeridos del soup html.
rgx = re.compile('[%s]' % '()')
5. f = FloatProgress(min=0, max=250)
display(f)
for row,i in zip(summary.find('table').
findAll('tr'),range(len(summary.find('table').findAll('tr')))):
for sitem in row.findAll('span',{'class':'secondaryInfo'}):
s = sitem.find(text=True)
year.append(rgx.sub(", s))
for ritem in row.findAll('td',{'class':'ratingColumn imdbRating'}):
for iget in ritem.findAll('strong'):
rating.append(iget.find(text=True))
ratingoutof.append(iget.get('title').split(' ', 4)[3])
for item in row.findAll('td',{'class':'titleColumn'}):
for href in item.findAll('a',href=True):
moviename.append(href.find(text=True))
rurl = 'https://www.rottentomatoes.com/m/'+ href.
find(text=True)
try:
rresult = requests.get(rurl)
except requests.exceptions.ConnectionError:
status_code = "Connection refused"
rc = rresult.content
rsoup = BeautifulSoup(rc)
try:
rot_audscore.append(rsoup.find('div',
{'class':'meter-value'}).find('span',
{'class':'superPageFontColor'}).text)
rot_avgrating.append(rsoup.find('div',
{'class':'audience-info hidden-xs
superPageFontColor'}).find('div').contents[2].
strip())
rot_users.append(rsoup.find('div',
{'class':'audience-info hidden-xs
superPageFontColor'}).contents[3].contents[2].
strip())
except AttributeError:
rot_audscore.append("")
rot_avgrating.append("")
rot_users.append("")
cast.append(href.get('title'))
imdb = "http://www.imdb.com" + href.get('href')
try:
iresult = requests.get(imdb)
ic = iresult.content
7. Generando N-gramas usando TextBlob.
Text = "Estoy aprendiendo PNL"
#Importando textblob
from textblob import TextBlob
#Para unigram : Use n = 1
TextBlob(Text).ngrams(1)
TextBlob(Text).ngrams(2)
Extraer frases nominales.
#Importando nltk
import nltk
from textblob import TextBlob
#Extrayendo elementos
blob = TextBlob("John está aprendiendo el procesamiento del lenguaje natural")
for np in blob.noun_phrases:
print(np)
Encontrar similitudes entre textos
documents = ("Me gusta la PNL",
"Estoy explorando PNL",
"Soy un principiante en PNL",
"Quiero aprender PNL",
"Me gusta la PNL avanzada")
#Importando bibliotecas
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
#Compute tfidf: método de ingeniería de características llamado matriz de coincidencia.
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
tfidf_matrix.shape
# Salida
# Calcular similitud para la primera oración con el resto de las oraciones
cosine_similarity(tfidf_matrix[0:1],tfidf_matrix)
# Salida
8. Correspondencia fonética
!pip install fuzzy
import fuzzy
soundex = fuzzy.Soundex(4)
soundex('natural')
#salida
'N364'
soundex('natuaral')
104
#salida
'N364'
soundex('language')
#salida
'L52'
soundex('processing')
#salida
'P625'
Etiquetar partes del lenguaje.
Text = "I love NLP and I will learn NLP in 2 month"
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
stop_words = set(stopwords.words('english'))
# Tokenizar el texto
tokens = sent_tokenize(text)
#Genera etiquetado para todos los tokens usando loop
for i in tokens:
words = nltk.word_tokenize(i)
words = [w for w in words if not w in stop_words]
# Etiquetador.
tags = nltk.pos_tag(words)
tags
#salida:
[('I', 'PRP'),
('love', 'VBP'),
('NLP', 'NNP'),
('I', 'PRP'),
('learn', 'VBP'),
('NLP', 'RB'),
9. ('2month', 'CD')]
• CC coordinating conjunction.
• CD cardinal digit.
• DT determiner.
• EX existential there (like: “there is” ... think of it like
• “there exists”.)
• FW foreign word.
• IN preposition/subordinating conjunction.
• JJ adjective ‘big’.
• JJR adjective, comparative ‘bigger’.
• JJS adjective, superlative ‘biggest’.
• LS list marker 1.)
• MD modal could, will.
• NN noun, singular ‘desk’.
• NNS noun plural ‘desks’.
• NNP proper noun, singular ‘Harrison’.
• NNPS proper noun, plural ‘Americans’.
• PDT predeterminer ‘all the kids’.
• POS possessive ending parent’s.
• PRP personal pronoun I, he, she.
• PRP$ possessive pronoun my, his, hers.
• RB adverb very, silently.
• RBR adverb, comparative better.
• RBS adverb, superlative best.
• RP particle give up.
• TO to go ‘to’ the store.
• UH interjection.
• VB verb, base form take.
• VBD verb, past tense took.
• VBG verb, gerund/present participle taking.
• VBN verb, past participle taken.
• VBP verb, sing. present, non-3d take.
• VBZ verb, 3rd person sing. present takes.
• WDT wh-determiner which.
• WP wh-pronoun who, what.
• WP$ possessive wh-pronoun whose.
• WRB wh-adverb where, when.
10. Extraer entidades del texto.
sent = "John is studying at Stanford University in California"
import nltk
from nltk import ne_chunk
from nltk import word_tokenize
ne_chunk(nltk.pos_tag(word_tokenize(sent)), binary=False)
#salida
Tree('S', [Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'),
('studying', 'VBG'), ('at', 'IN'), Tree('ORGANIZATION',
[('Stanford', 'NNP'), ('University', 'NNP')]), ('in', 'IN'),
Tree('GPE', [('California', 'NNP')])])
Here "John" is tagged as "PERSON"
"Stanford" as "ORGANIZATION"
"California" as "GPE". Geopolitical entity, i.e. countries, cities, states.
#Usando SpaCy
import spacy
nlp = spacy.load('en')
# Leer o crear una sentencia
doc = nlp(u'Apple is ready to launch new phone worth $10000 in New york time square ')
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
#salida
Apple 0 5 ORG
10000 42 47 MONEY
New york 51 59 GPE
Análisis de sentimientos.
Para hacer un análisis de sentimientos use TextBlob. El cual, básicamente dará 2 métricas.
1. Polaridad = La polaridad se encuentra en el rango de [-1,1] donde 1 significa un
enunciado positivo y -1 significa una sentencia negativa.
2. Subjetividad = Subjetividad se refiere a que principalmente es una opinión personal
y no una información objetiva [0,1].
review = "I like this phone. screen quality and camera clarity is really good."
review2 = "This tv is not good. Bad quality, no clarity, worst experience"
from textblob import TextBlob
#TextBlob tiene un modelo de predicción de sentimientos previamente entrenado
blob = TextBlob(review)
11. blob.sentiment
#salida
Sentiment(polarity=0.7, subjectivity=0.6000000000000001)
blob = TextBlob(review2)
blob.sentiment
#salida
Sentiment(polarity=-0.6833333333333332,
subjectivity=0.7555555555555555)
# Esta es una crítica negativa, ya que la polaridad es "-0.68".
Texto ambiguo.
Text1 = 'I went to the bank to deposit my money'
Text2 = 'The river bank was full of dead fishes'
#Instalar pywsd
!pip install pywsd
#Importar funciones
from nltk.corpus import wordnet as wn
from nltk.stem import PorterStemmer
from itertools import chain
from pywsd.lesk import simple_lesk
# Frases
bank_sents = ['I went to the bank to deposit my money', 'The river bank was full of dead
fishes']
# llamando a la función lesk e imprimiendo resultados tanto ambas frases
print ("Context-1:", bank_sents[0])
answer = simple_lesk(bank_sents[0],'bank')
print ("Sense:", answer)
print ("Definition : ", answer.definition())
print ("Context-2:", bank_sents[1])
answer = simple_lesk(bank_sents[1],'bank','n')
print ("Sense:", answer)
print ("Definition : ", answer.definition())
#Resultado:
Context-1: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition : a financial institution that accepts deposits and channels the money into
lending activities
Context-2: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition : sloping land (especially the slope beside a body of water)
12. Convertir voz a texto.
!pip install SpeechRecognition
!pip install PyAudio
import speech_recognition as sr
r=sr.Recognizer()
with sr.Microphone() as source:
print("Please say something")
audio = r.listen(source)
print("Time over, thanks")
try:
print("I think you said: "+r.recognize_google(audio));
#print("I think you said: "+r.recognize_google(audio, language ='en-EN'));
#except sr.UnknownValueError:
#print("Google Speech Recognition could not understand audio")
#except sr.RequestError as e:
#print("Could not request results from Google Speech
#Recognition service; {0}".format(e))
except:
pass;
#Salida
Please say something
Time over, thanks
I think you said: I am learning natural language processing
Convertir texto a voz.
!pip install gTTS
from gtts import gTTS
#elija el lenguaje, English('en')
convert = gTTS(text='I like this NLP book', lang='en', slow=False)
# Guardando el audio en un archivo mp3 llamado
myobj.save("audio.mp3")
#output: Por favor reproduzca el archivo audio.mp3 guardado localmente en su máquina
para escucharlo
Traducción de voz.
!pip install goslate
import goslate
text = "Bonjour le monde"
gs = goslate.Goslate()
translatedText = gs.translate(text,'en')
print(translatedText)