Calculated Values in New Column with Pandas and SQL
A question was posed to me on how to read in data from a text file into pandas and then generate a new column out of the existing data. After that, the next question was how to perform the same operation using SQL.
Pandas
Import the pandas library.
Then read in the data from the text file. To do so, use the DataFrame.read_clipboard()
command. First, open the text file, select all and copy (ctrl-a
, ctrl-c
).
ID Speed Time | |
C0132 94.32 3.51 | |
C5070 106.66 1.32 | |
C2145 44.01 0.54 | |
C4974 54.26 0.92 | |
C2967 78.92 2.37 | |
C0104 100.11 4.87 | |
C3603 34.23 2.93 | |
C9014 69.69 3.45 | |
C0296 50.01 0.21 | |
C7046 112.85 2.06 |
Then run the command as shown below.
Write the resulting DataFrame to a csv file…
…so that in future we can just read it in directly as needed without having to use the clipboard using the command pd.read_csv(“movement.csv”)
To calculate distance, use the required columns plugged into the formula for it (distance = speed * time
). Generate a new Distance column which is populated with the result of this calculation.
This can then be written to an Excel file.
SQL
For this portion, use the site SQLFiddle (http://sqlfiddle.com/). This example is done using PostgreSQL9.6, so make sure to select that as your database.
Copy the following text into the left portion of the window then click on the Build Schema button to execute the code.
CREATE TABLE movement( | |
id varchar(10), | |
Speed real, | |
Time real); | |
INSERT INTO movement VALUES ('C0132', 94.32, 3.51); | |
INSERT INTO movement VALUES ('C5070', 106.66, 1.32); | |
INSERT INTO movement VALUES ('C2145', 44.01, 0.54); | |
INSERT INTO movement VALUES ('C4974', 54.26, 0.92); | |
INSERT INTO movement VALUES ('C2967', 78.92, 2.37); | |
INSERT INTO movement VALUES ('C0104', 100.11, 4.87); | |
INSERT INTO movement VALUES ('C3603', 34.23, 2.93); | |
INSERT INTO movement VALUES ('C9014', 69.69, 3.45); | |
INSERT INTO movement VALUES ('C0296', 50.01, 0.21); | |
INSERT INTO movement VALUES ('C7046', 112.85, 2.06); |
Copy the following text into the right portion of the window the click the Run SQL button to execute it the code.
SELECT Speed, | |
Time, | |
Speed * Time as "Distance" | |
FROM movement |
This will generate a table that has the speed and time columns as well as the calculated distance column.
And that is how to create a new column with calculated values in pandas and SQL.