Sharing is caring!

Lesson 5: Working with Local Files, Google Drive, Google Sheets, and Google Cloud Storage in Google Colab

Table of Contents

Introduction

In Google Colab, you can easily import and export files from your local filesystem, access Google Drive, manipulate Google Sheets, and work with files stored in Google Cloud Storage (GCS).

Also, check:

Let’s explore how to perform these operations in a Colab notebook.

Importing Files from Your Local Filesystem

The files.upload method allows you to import files from your local filesystem. Here’s how to use it:

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Downloading Files to Your Local Filesystem

To download files to your local system, use files.download:

from google.colab import files

with open('example.txt', 'w') as f:
  f.write('some content')

files.download('example.txt')

Accessing Google Drive

You can access Google Drive in multiple ways, including mounting your Google Drive in the runtime environment or using the Drive API.

Here’s how to mount your Google Drive:

from google.colab import drive
drive.mount('/content/drive')

PyDrive2

PyDrive2 is a wrapper around the Google Drive API. Here’s how to use it to import and download files:

from pydrive2.auth import GoogleAuth
from pydrive2.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Create and import a text file
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

# Load a file by ID and print its content
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))

Drive REST API

You can also use the native Drive REST API to interact with files. Here’s an example:

from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

# Create a Drive file with data from Python
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

file_metadata = {
  'name': 'Sample file',
  'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt', 
                        mimetype='text/plain',
                        resumable=True)
created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields='id').execute()
print('File ID: {}'.format(created.get('id')))

Google Sheets

To interact with Google Sheets, you can use the open-source library gspread. Here’s how to create a sheet with Python data and download that data as a pandas DataFrame:

import gspread
from google.auth import default

creds, _ = default()
gc = gspread.authorize(creds)

# Create a sheet with Python data
sh = gc.create('My cool spreadsheet')

# Download data from a sheet into Python as a pandas DataFrame
worksheet = gc.open('My cool spreadsheet').sheet1
rows = worksheet.get_all_values()
print(rows)

import pandas as pd
pd.DataFrame.from_records(rows)

Google Cloud Storage (GCS)

GCS allows you to store and retrieve data in the cloud. Here’s how to import and export files to and from GCS:

from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

# Create a local file to import
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

# Create a bucket in the specified project
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())
body = {
  'name': bucket_name,
  'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')

# Import the file into the bucket
media = MediaFileUpload('/tmp/to_upload.txt', 
                        mimetype='text/plain',
                        resumable=True)
request = gcs_service.objects().insert(bucket=bucket_name, 
                                       name='to_upload.txt',
                                       media_body=media)
response = None
while response is None:
  _, response = request.next_chunk()
print('Upload complete')

# Download the file from the bucket
with open('/tmp/downloaded_from_gcs.txt', 'wb') as f:
  request = gcs_service.objects().get_media(bucket=bucket_name,
                                            object='to_upload.txt')
  media = MediaIoBaseDownload(f, request)
  done = False
  while not done:
    _, done = media.next_chunk()
print('Download complete')

With these examples, you can easily manipulate local files, access Google Drive, interact with Google Sheets, and use Google Cloud Storage in Google Colab to make the most out of your interactive notebooks.

Data Visualization with Plotl

Plotly in Colab: Plotly is a popular visualization library that works well in Colab. You can create interactive plots easily. Here’s a basic example:

import plotly.graph_objs as go

# Create data
x = [1, 2, 3, 4]
y = [10, 11, 12, 13]

# Create a trace
trace = go.Scatter(
    x=x,
    y=y
)

data = [trace]

# Plot the data
fig = go.Figure(data=data)
fig.show()

Machine Learning with TensorFlow: Colab is a great platform for experimenting with machine learning models using TensorFlow. Here’s a simple example of training a neural network:

import tensorflow as tf

# Define a simple neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

# Compile the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Load the MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate the model
model.evaluate(x_test, y_test)

Data Visualization with Seaborn: Seaborn is another powerful visualization library that works well in Colab. Here’s a simple example:

import seaborn as sns
import matplotlib.pyplot as plt

# Load a sample dataset
tips = sns.load_dataset("tips")

# Create a bar plot
sns.barplot(x="day", y="total_bill", data=tips)

# Show the plot
plt.show()

Here are some examples to get you started, but there are numerous other libraries and tools available in Colab for different tasks such as data analysis, machine learning, and more. If you need additional examples or specific guidance on any topic, feel free to ask!

Sure, here are some more code examples demonstrating how to work with Local Files, Google Drive, Google Sheets, and Google Cloud Storage in Google Colab:

Local Files

from google.colab import files

# Upload files from local system
uploaded = files.upload()

# Download files to local system
with open('example.txt', 'w') as f:
    f.write('some content')

files.download('example.txt')

Google Drive

from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Write to a file in Google Drive
with open('/content/drive/My Drive/foo.txt', 'w') as f:
    f.write('Hello Google Drive!')

# Flush and unmount
drive.flush_and_unmount()
print('All changes made in this colab session should now be visible in Drive.')

Google Sheets

import gspread
from google.auth import default

# Authenticate
creds, _ = default()
gc = gspread.authorize(creds)

# Create a spreadsheet
sh = gc.create('My cool spreadsheet')

# Open the spreadsheet and write data
worksheet = gc.open('My cool spreadsheet').sheet1
cell_list = worksheet.range('A1:C2')

import random
for cell in cell_list:
    cell.value = random.randint(1, 10)

worksheet.update_cells(cell_list)

# Read data from spreadsheet into pandas DataFrame
rows = worksheet.get_all_values()
import pandas as pd
df = pd.DataFrame.from_records(rows)

Google Cloud Storage (GCS)

from google.colab import auth
auth.authenticate_user()

# Use gsutil for GCS operations
!gsutil mb gs://your-bucket-name
!gsutil cp /tmp/to_upload.txt gs://your-bucket-name/to_upload.txt

These are just a few examples to get you started. You can explore more functionalities and APIs provided by these services to perform various tasks in Google Colab.

Example for working with Local Files, Google Drive, Google Sheets, and Google Cloud Storage in Google Colab

Let’s explore a more intricate scenario that includes handling Local Files, Google Drive, Google Sheets, and Google Cloud Storage within Google Colab.

In this instance, we will showcase a sequence where you import a file from your local device, manipulate it, input the modified data into a Google Sheet, and finally store the processed data in Google Cloud Storage.

from google.colab import files
import gspread
from google.auth import default
from oauth2client.client import GoogleCredentials
from google.colab import auth

# Authenticate for Google Sheets
auth.authenticate_user()
creds, _ = default()
gc = gspread.authorize(creds)

# Upload a file from local system
uploaded = files.upload()

# Process the uploaded file
file_name = list(uploaded.keys())[0]  # Assuming only one file is uploaded
with open(file_name, 'r') as f:
    data = f.readlines()

# Write the processed data to a Google Sheet
sh = gc.create('Processed Data')
worksheet = sh.sheet1
for row in data:
    row_data = row.strip().split(',')
    worksheet.append_row(row_data)

# Save the processed data to Google Cloud Storage
from google.cloud import storage

# Authenticate for Google Cloud Storage
storage_client = storage.Client.from_service_account_json('/content/your-service-account.json')

# Define bucket name
bucket_name = 'your-bucket-name'

# Create a bucket if it doesn't exist
bucket = storage_client.bucket(bucket_name)
if not bucket.exists():
    bucket.create()

# Write processed data to a file and upload to Cloud Storage
blob = bucket.blob('processed_data.csv')
blob.upload_from_string(','.join(data), 'text/csv')

print('Process completed successfully!')

In this example:

  • We authenticate for Google Sheets and Google Cloud Storage.
  • Upload a file from the local system.
  • Process the uploaded file (here, we simply read the lines and split by commas, assuming it’s a CSV).
  • Write the processed data to a Google Sheet.
  • Save the processed data to Google Cloud Storage as a CSV file.

Please remember to substitute the placeholders ‘your-service-account.json’ and ‘your-bucket-name’ with the correct file path for your service account and the name of your bucket.

Furthermore, you might have to modify the processing logic depending on the format and content of the file you uploaded.

Interactive Data Visualization and Saving in Google Colab

Here’s an additional illustration showcasing a more engaging process with Google Colab. We’ll develop a form to gather user input, analyze the data provided, utilize matplotlib to visualize it, and finally store the visualizations on Google Drive.

import matplotlib.pyplot as plt
import numpy as np
from google.colab import drive
from google.colab import files

# Mount Google Drive to save visualizations
drive.mount('/content/drive')

# Function to create and save a bar chart
def create_bar_chart(x, y, title, xlabel, ylabel, filename):
    plt.figure(figsize=(8, 6))
    plt.bar(x, y)
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig(filename)
    plt.show()

# Form to collect user input
title = input("Enter chart title: ")
xlabel = input("Enter x-axis label: ")
ylabel = input("Enter y-axis label: ")
data = input("Enter data points separated by commas: ")

# Process input data
data_points = [int(x.strip()) for x in data.split(',')]

# Generate x values (assuming equal spacing)
x_values = np.arange(1, len(data_points) + 1)

# Create and save the bar chart
chart_filename = '/content/drive/My Drive/bar_chart.png'
create_bar_chart(x_values, data_points, title, xlabel, ylabel, chart_filename)

# Provide a link to download the chart
print("Bar chart saved to Google Drive:")
print("Download here:", chart_filename)

In this example:

  • We mount Google Drive to save the visualizations later.
  • Collect user input for chart title, x-axis label, y-axis label, and data points.
  • Process the input data (assuming integers separated by commas).
  • Generate x-values assuming equal spacing between data points.
  • Create and save a bar chart using matplotlib.
  • Provide a link to download the saved chart from Google Drive.

You can extend this example by adding more interactivity, such as allowing users to choose the type of chart or providing options for customizing chart appearance.


Can google Colab work with local files?

Google Colab has the capability to handle local files. By utilizing the files.upload() function, you can easily upload files from your local system to the Colab environment.

This function presents a dialog box where you can select the files you want to upload. Once the files are uploaded, you can conveniently access and modify them within your Colab notebook.

Can google Colab access Google Drive files?

TaskCode Example
Working with local filespython from google.colab import files uploaded = files.upload() for filename, data in uploaded.items(): print(f"Uploaded file '{filename}' with length {len(data)} bytes")
Accessing Google Drive filespython from google.colab import drive drive.mount('/content/drive') file_path = '/content/drive/My Drive/example.txt' with open(file_path, 'r') as file: content = file.read() print(content)
Using local storage in Colabpython with open('/content/sample.txt', 'w') as file: file.write('This is a sample text.') files.download('/content/sample.txt')
Using Google Cloud Storage (GCS) in Colabpython from google.colab import auth auth.authenticate_user() !gsutil cp /content/sample.txt gs://your-bucket-name/sample.txt !gsutil cp gs://your-bucket-name/sample.txt /content/sample_from_gcs.txt

Replace 'your-bucket-name' with the name of your Google Cloud Storage bucket. This table provides a clear overview of each task along with its corresponding code example.

How do I use local storage in google Colab?

You can use local storage in Google Colab to read from and write to files on the Colab virtual machine. Here’s a simple example demonstrating how to write data to a file and then read it back:

# Write data to a file
with open('/content/sample.txt', 'w') as file:
    file.write('This is a sample text.')

# Download the file
from google.colab import files
files.download('/content/sample.txt')

In this code:

  • We open a file named sample.txt in write mode ('w') using a context manager (with statement).
  • We write the text 'This is a sample text.' to the file.
  • We use files.download() to download the file sample.txt to your local machine.

You can modify the file path and content as needed for your use case.

How to use google Cloud Storage in Colab?

You must authenticate with your Google Cloud account before utilizing Google Cloud Storage (GCS) in Google Colab.

After that, you can interact with GCS through the gsutil command-line tool or the Python API. Check out this simple example showcasing both approaches.

  • Using gsutil:
# Authenticate with Google Cloud
from google.colab import auth
auth.authenticate_user()

# Set your project ID
project_id = 'your-project-id'

# Install the Google Cloud SDK
!curl https://sdk.cloud.google.com | bash

# Initialize `gcloud`
!gcloud init

# Use gsutil to interact with Google Cloud Storage
!gsutil ls gs://your-bucket-name

Replace 'your-project-id' and 'your-bucket-name' with your actual project ID and bucket name.

  • Using Python API:
# Authenticate with Google Cloud
from google.colab import auth
auth.authenticate_user()

# Import the necessary libraries
from google.cloud import storage

# Set your project ID
project_id = 'your-project-id'

# Create a client
client = storage.Client(project=project_id)

# List the buckets in your project
buckets = list(client.list_buckets())
for bucket in buckets:
    print(bucket.name)

Replace 'your-project-id' with your actual project ID.

Here are some sample scenarios to help you get started with GCS in Google Colab. You have the ability to do things like upload files, download files, make buckets, and handle objects by utilizing the right techniques in the gsutil tool or the Python API.

How do I use external files in Google Colab?

To use external files in Google Colab, such as files from your local system or files stored in Google Drive, you can follow these steps:

  • Local Files:
  • Use the files.upload() method to upload files from your local system to Google Colab.
  • Use the files.download() method to download files from Google Colab to your local system. Example:
   from google.colab import files

   # Upload a file
   uploaded = files.upload()

   # Download a file
   with open('example.txt', 'w') as f:
       f.write('some content')
   files.download('example.txt')
  • Google Drive:
  • Mount your Google Drive to access files stored in it. You need to authorize Colab to access your Drive. Example:
   from google.colab import drive

   # Mount Google Drive
   drive.mount('/content/drive')

   # Access files in Drive
   with open('/content/drive/My Drive/foo.txt', 'r') as f:
       print(f.read())
  • Google Cloud Storage (GCS):
  • Authenticate with your Google Cloud account.
  • Use either the gsutil command-line tool or the Python API to interact with GCS. Example using gsutil:
   from google.colab import auth

   # Authenticate with Google Cloud
   auth.authenticate_user()

   # Use gsutil to interact with Google Cloud Storage
   !gsutil ls gs://your-bucket-name

These methods allow you to seamlessly access and work with external files within your Google Colab environment.

How to read a Google sheet in Colab?

To read a Google Sheet in Google Colab, you can use the gspread library, which allows you to interact with Google Sheets from Python. Here’s a step-by-step guide:

  • Install and Authenticate:
    First, install the gspread library using pip. You also need to authenticate your Google account.
   !pip install gspread
  • Authentication:
    Authenticate your Google account to access Google Sheets. This will prompt you to authenticate using an authentication code.
   from google.colab import auth
   auth.authenticate_user()
  • Access Google Sheets:
    Use gspread to access the Google Sheet by specifying the name of the spreadsheet you want to access.
   import gspread

   # Authenticate and create a client
   gc = gspread.authorize(auth)

   # Open the Google Sheet by name
   worksheet = gc.open('Your Spreadsheet Name').sheet1
  • Read Data:
    Once you have access to the worksheet, you can read the data from it into a DataFrame or any other data structure.
   import pandas as pd

   # Read all values from the worksheet
   data = worksheet.get_all_values()

   # Convert to DataFrame
   df = pd.DataFrame(data[1:], columns=data[0])

Now, df contains the data from your Google Sheet, and you can perform any further data processing or analysis as needed.


Conclusion

To sum up, Google Colab provides a flexible platform for interactive data visualization and processing. By integrating with different libraries and services, users can effortlessly import and manipulate data from local files, Google Drive, Google Sheets, and Google Cloud Storage.

Moreover, Colab offers interactive features like forms and widgets, allowing users to personalize their analysis and visualize results in real-time.

These capabilities empower researchers, data scientists, and educators to enhance their workflows and collaborate efficiently on data-driven projects in the cloud.


3 Comments

Lesson 4: Best Forms In Google Colab With Python Vs R · May 22, 2024 at 12:21 pm

[…] Lesson 5: Working with Local Files, Google Drive, Google Sheets, and Google Cloud Storage in Google … […]

Lesson 3: Best Matplotlib Charts And Diagrams In Colab · May 22, 2024 at 9:28 pm

[…] Lesson 5: Working with Local Files, Google Drive, Google Sheets, and Google Cloud Storage in Google … […]

Machine Learning Project 1: Honda Motor Stocks Best Prices · May 23, 2024 at 6:59 pm

[…] Lesson 5: Working with Local Files, Google Drive, Google Sheets, and Google Cloud Storage in Google … […]

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *