Sunteți pe pagina 1din 21

How To

Use
ImportXML
in Google
Docs
Written by: Richard Baxter
Put me in front of a Mac and its almost as if I
never learned to use a computer. Put me in
front of Google Spreadsheets and all of the
time Ive spent working with Excel feels a
little like time wasted, and not in a good way.
Im just not very used to a spreadsheet that
isnt Excel.
Unafraid of a challenge, I recently decided to
give Googles (exceptional) importXML,
importFEED and importHTML functions a
try the ability to fetch information from the
web to retrieve the data you need. Mostly to
make an interesting blog post, but partly out

2016 Builtvisible

About
Contacttrying
Services
Work
Its frustrating
to get XML
data Blog
into
Microsoft Excel unless youve got the time
25
and patience to build some basic Macros or
VBscript for your requirements. With Google
Docs, its really easy.

A few resources
If you want to use Google Docs to extract
data from the web, it would be a good idea
for you to learn a little xPath. XPath is used
to navigate through elements and attributes in
an XML document, or, in simple terms, you
can use xPath to fetch little bits of data
contained in structured elements like <span>,
<div> or links or pretty much anything,
really.
Also, there are a few people who have been
doing this a while, and probably have sample
spreadsheets that blow some of the examples
below away but you have to start
somewhere, right? If youre already an
importXML / Google Docs Ninja, maybe go
and find something else to do instead of
reading this post.
If youre interested, I made a Google Docs
Spreadsheet with all of the examples below:
http://bit.ly/9Fs7aF

Does anyone know?


Does anyone know is such an interesting

Software

10

20

bu i l t v i s i b le

location for everyone on Twitter looking for a


very specific, thing. Great if you happen to be
trading in that thing.

Try a query like this to pull through results


from the Twitter search RSS feed:
=Importfeed(http://search.twitter.com

/search.atom?q=+restaurant+%22anyone+know%22+london+OR+manchester

Twitter followers
A nod to Steven Foskett for this one, and
particular kudos for the mention of vCard, the
query for LinkedIn connections, Klout score
and Alexa Rank. Nice!
Try this
query: =importXML(http://twitter.com
/[your-username],
//span[@id=follower_count])
Which will give you the number of followers
you have on your Twitter profile. I added
together the total followers that my SEO team

bu i l t v i s i b le

totals up all followers counts for all UK


agencies? I wonder if theres a correlation
between that data and turnover :-)

Pull price data from the


web
I think that, after some mild haranguing, Will
might have purchased himself a pair of
Etymotic headphones. Perhaps my pitch
would have gone slightly more efficiently
with a little xPath and Google Product search:

For something like this, a way smarter


approach to get pricing data from Amazon
would be to use their API but you get the
point with this brief example.

Get all of your

bu i l t v i s i b le
Try something like this:
=ImportXML(http://www.yourcompetitordomain.com
/sitemap.xml,//url/loc)
I mentioned doing this with Excel to find
orphaned pages, but you can have a lot more
fun with importXML. For one, theoretically
you could go off and fetch all keywords
contained in the <title> tag of each of the
URLs an instant keyword strategy!

Pull link data from


Blekko
With a query like this:
=ImportXML(http://blekko.com
/ws/http://builtvisible.com/+/links+
/rss,//link)
Blekko is everyones favourite new SEO tool,
and fair enough, it is quite cool. As Blekko
are happy to push their data out via RSS,
were able to pull this data into our
spreadsheets with ImportXML (to be fair this
is really easy with Excel, unless youd like to
create multiple columns with different
domain queries.

bu i l t v i s i b le
More Blekko link data
tables
Blekko have a feature that allows for a pretty
insightful breakdown of their SEO data on
your domain. If you want to pull some of that
through in to Google Docs, no problem:

Try this query:


=importhtml(http://blekko.com
/ws/www.smashingmagazine.com+
/seo,table,7)

Have fun
This wasnt a particularly advanced post I
did quite enjoy the thought of what to do next
with this data, though. Fetch IP addresses,
WHOIS details, root domain links or
keyword research data with Google Suggest,
the Alchemy API, or plain scraping your
competitor home pages. If youre using
importXML, Id really like to hear how.
Anyway, as I mentioned earlier, please feel

bu i l t v i s i b le
what you did.

A little
update
I got in touch with my friend Tom from
Distilled to see if he wanted to contribute.
Hes been out in Vegas, but came back with a
tip to solve the problem of Google caching a
result for around two hours at a time:
Google docs will cache a URL for ~2
hours and so if you want to crawl a URL
more often than that then you need to add
a modifier to the URL.
I use int(now()*1000) to generate a
unique timestamp and then add that into
the URL in a dummy query string. E.g.
http://www.google.com
/search?q=seattle+seo+consulting&
pws=0&gl=us&time=1354333
The search results wont change when
you change the time value but Google
docs will treat it as a fresh URL and
crawl it again.
Also you can do lots of amazingly fancy
things using Google Scripts (kind of like
macros for google docs) but dont have a
huge amount of time to go into detail

bu i l t v i s i b le

Well, hopefully Tom will have time soon


thanks for contributing!

Learn More
Builtvisible are a team of specialists who
love search, SEO and creating content
marketing that communicates ideas and
builds brands.
To learn more about how we can help you,
take a look at the services we offer.

Stay Updated
Enter your email
Join Now
Follow: | | |
Tags: How To | Categories: Research,
Technical

28 thoughts on How To
Use ImportXML in
Google Docs
Sam Hamilton
17TH NOVEMBER 2010 AT
11:28

Not trying to stick up for MS but importing

bu i l t v i s i b le

/en-gb/excel-help/import-xml-dataHP010206405.aspx

Matthew Brookes
17TH NOVEMBER 2010 AT
13:28

Hi Richard,
nice article pretty straight forward but still
good to get some ideas of what you can do.
And you can always export to Excel.
Have you taken a look at the Google refine
product? i have been playing with it but a
lack memory is causing me issues its quiet
good at quickly filtering data or looking for
trends and you can pull data into it as well.
Something else to have a look at is DataSift
(from the team at TweetMeMe) as that looks
to open up a lot of twitter mashing
possibilities.

richardbaxterseo
17TH NOVEMBER 2010 AT
13:37

Hey Matt definitely. I also think theres a


ton of milage in Yahoo Pipes (which, unless
Im mistaken will happily export xml which
can be imported into Google docs). Ive got
a few macros and VBscripts to do these
things in Excel but its quite amazing how

bu i l t v i s i b le
richardbaxterseo
17TH NOVEMBER 2010 AT
13:40

Hey Sam,
Not that easy if you want to form multiple
columns, concatenating different queries to
form varying URLs for the appropriate
XML response it is still a bit of a pain! You
have to create a data file and its such a
mess around compared to Google Docs. If
you have an example though upload the
file and lets take a look. Id be delighted to
learn!

cart2mobile
17TH NOVEMBER 2010 AT
13:42

Thanks for this update on Google


spreadsheets. I wasnt aware of Does
anyone know?. Therefore this was really
of great help.

bu i l t v i s i b le
19:52

Wow! Now if we can only get Google Docs


to make calls to the Twitter servers that
would be great!

James Morell
18TH NOVEMBER 2010 AT
13:24

Again, not sticking up for MS but I found


the XML data tool excel add in really
useful over the past couple of weeks:
http://office.microsoft.com/en-us/excelhelp/create-an-xml-data-file-and-xmlschema-file-from-worksheetdata-HA010263509.aspx

Jemima
18TH NOVEMBER 2010 AT
15:16

Im a bit of a fan of that twitter fan count


do you know if its possible to do the same
for facebook pages, perhaps based on the
page id?

Finding Keywords
25TH NOVEMBER 2010 AT
04:46

Is there a way I can extract from the serp


for a keyword phrase?

bu i l t v i s i b le
13:37

xml to google spreadsheet nice article let me


try do download competitors url.. thanks for
sharing..

Matt
26TH JANUARY 2011 AT 21:27

Hey Richard; thanks for an inspiring post


but do you mind sharing the query you used
to create the columns in the Google Product
Search example?
Thanks!

Matt
27TH JANUARY 2011 AT 17:59

never mind; I overlooked the link to your


GDoc at the bottom of the post.
Thanks

Red
5TH MARCH 2011 AT 00:07

I coincidentally tried a few of these a few


wks ago. I generated a sitemap for a site,
stripped out everything until the urls were
left in excel.
I then scraped the urls for tag and meta
description details which all worked well

bu i l t v i s i b le

I was hoping that I could somehow mass


edit some title tages in a marathon manner,,,
Unfortunately GDocs doesnt support much
pasting into the spreadsheet and only
supports 50 =importxml queries
Is there anyway to use GDOCS to ref the
XPATH code to then create an follow like
instance that will affect a sequence of say
500 cells in a column? Otherwise its
pointless and Ill have to learn php, RoR
+regular expressions and I dont want to
do that yet. Life is too short!
Whilst Im here does anyone find the
XPATH tools at liquidXML any good for
these SEO scraping functions?

Mihai C.
24TH MARCH 2011 AT 12:28

I am tring to use the function importxml()


but without succes.
Maybe you can help me. I want to extract a
currency exchange rate from xml file, the
EUR figure only:
http://www.bnr.ro/nbrfxrates.xml
Nothing works :(
=importXML(http://www.bnr.ro
/nbrfxrates.xml,//DataSet/Body/Cube
/Rate[EUR])

bu i l t v i s i b le
WMG
16TH JUNE 2011 AT 12:00

Hmmm does anyone know if GDocs is


being flaky for scraping nowadays? Just
tried doing a lil GDocs scraping project that
I created months ago.
Im scraping the SERPS using importXML.
Now I get the serps results in GDocs but
when I paste these into excel it does
something weird and encodes everything.
It used to work a treat a few months back. I
could paste the cells into excel exactly as
the GDocs spreadsheet displayerd them.
Now it seems to concatenate url results and
add weird encoded characters Ive tried
paste special etc is GDocs defunct for
scraping now?
eg. Eg.
#VALUE! #VALUE!
http://www.markosweb [dot] com/www
/forex-handel-online.blogspot [dot] com/
http://www.freeadsboard [dot]
com/index.php/topic/134024-everythingfor-forex-handel/
Anyone know how to bypass this?

WMG

bu i l t v i s i b le
too:

http://www.google.com/support/forum
/p/Google%20Docs
/thread?tid=19733fc7fb48ecd5&hl=en
Theres a few tidbits there for anyone
seeking help not sure how useful these are
as yet

Ryan Boots
27TH JULY 2011 AT 14:32

Ive found this to be enormously useful.


However, when I couldnt find any online
string builder to help build the importXML
strings, I decided to create my own.
http://www.xpathbuilder.com/
Its still very much a work in progress, so
Id love some feedback for ideas for future
improvements.

richardbaxterseo
27TH JULY 2011 AT 16:04

Awesome! Tweeted

Jeremy
11TH AUGUST 2011 AT 17:04

A simple (I use that loosely) would be to

bu i l t v i s i b le

1 Spreadsheet would use =importrange


which pulls in data from other spreadsheets.
The other spreadsheets would use the
=importxml to get the actual data you want.

Wikiopens
18TH OCTOBER 2011 AT 04:13

You should learn xPath to get more


infomation you need

Red
18TH OCTOBER 2011 AT 09:43

@Ryan Boots Just had a play with that


xpathbuilder really neat and intuitive
its a bit like something I created for
Google Docs bulk scraping. Does the BING
search return 100 results?
Re. =importxml(http://www.bing.com
/search?q=kiss+my+ass&count=100,
//div[@class=sb_tlst]//h3//a/@href)
Any way to bring back 1000 results in
BING?

Saul
19TH DECEMBER 2011 AT 15:20

Hi Ryan, great post, but have you noticed


the xpath query you use to grab the twitter
followers is not working?

bu i l t v i s i b le

query did not return any data


Do you know what is going on or how to
update the query so that it works?
Thank you!

Maire
28TH DECEMBER 2011 AT
20:23

Thanks so much for this post.


I wanted to set a financial spreadsheet to
help my daughter pick out some safe
stocks with good dividend yields. Here are
my queries.
Append stock ticker (add ?hl=en if you are
in a non-US locale):
http://finance.yahoo.com/q/ks?s=
Different screen scrapers:
//tr[td/text()[contains(.,Forward Annual
Dividend Yield)]]/td[2]
//tr[td/text()[contains(.,Revenue
Growth)]]/td[2]
//tr[td/text()[contains(.,Earnings
Growth)]]/td[2]
//tr[td/text()[contains(.,Current
Ratio)]]/td[2]
//tr[td/text()[contains(.,PEG Ratio)]]/td[2]
//tr[td/text()[contains(.,Return on
Assets)]]/td[2]

bu i l t v i s i b le

//tr[td/text()[contains(.,3 month)]]/td[2] 3
month volume
//tr[td/text()[contains(.,10 day)]]/td[2] 10
day volume

Maire
29TH DECEMBER 2011 AT 18:54

It looks like the pages are not localized so


&hl=en-US is not needed.

Ragu
17TH MARCH 2012 AT 00:00

AlexaRank ImportXML function no longer


worked for me as of October 2011

Bob Jones
15TH JUNE 2012 AT 04:04

The first link in the article is broken. Looks


like Google merged the page into this list:
https://support.google.com/docs/bin
/static.py?hl=en&topic=25273&
page=table.cs&
path=1361471-1360901-1360868-1397170

Dave
14TH AUGUST 2012 AT 22:45

Thank you. Embarrassed to say I am a very


late starter into Xpath and Google Docs &
having found this post you give some

bu i l t v i s i b le

reading the posts from the guys at Distilled


into using Google Docs to perform quick
checks for rankings in the SERPS but it
seem that I am too late and they have now
stopped that function from working, still its
great to gain exposure to this and get ideas
on how to use these tools thank you.

Bilal AHmed
15TH OCTOBER 2015 AT 13:36

Hy Friends,
Need Some Help.
In Import XML feature of Google Sheets
Using This Code
=importxml(A1,//div[@class=detail])
from the link http://www.fabingo.com
/-english-p-500.html
I get that value BookFort EXPORT ED
(English)Author:Bernard
CornwellISBN:0007331754ISBN13:9780007331758Binding:PaperbackPublishing
Date:2011 MayPublisher:HarperCollins
PublishersLanguage:EnglishNumber Of
pages:400Dimensions:6.81,4.25Weight:272
grams
Dealsnoffers.pk Test Sheet:
https://docs.google.com/spreadsheets
/d/1LkFFa3AO9fKHjI3knJWBzoPh6_YjApskYnq0feNXWpM
/edit#gid=0

bu i l t v i s i b le

Looking Forward for any help


Thanks

Leave a Reply
Your email address will not be published. Required
fields are marked *
Comment

Name *

Email *

Website

Post Comment

Related posts:

bu i l t v i s i b le
Product Vocabularies for Online Retailers
[Structured Data & Microformats]
Extract Your Competitor Keyword Strategy
[Excel Skills]

S-ar putea să vă placă și