InfoSurf Logo Chemical Literature (Chem 184/284)
 You are here: Home > Library Services > Library Instruction > Chemical Literature (Chem 184/284) > Lecture 14, part II

Lecture 14, part II: SciFinder Web, Part 3:
Substance Searching: Names, Molecular Formulas and Displaying Substance Records

Searching by Chemical Substance: Identifiers and Molecular Formulas

SciFinder Web opening screen for substance searching

The second option at the top of the SciFinder Web opening screen is "substances". Selecting this option rleads to the screen above. There are three basic options for searching for substances: chemical structure, molecular formula and substance identifier. The former will be covered in the next lecture.

Substance Identifier

Substance Identifier input screen

The Substance Identifier input screen allows you to search by chemical name(s), or by CAS Registry Number(s). You can search more than one identifier at the same time, by putting each name or Registry Number on a separate line.

Searching for common analgesics

Clicking the "Search" button pulls up the brief record displays for the substances identified.

Records are displayed 3 to a line, with the highest (i.e. most recently added) Registry Numbers first. Unlike the client versions of SciFinder, SciFinder Web has no alternative sorting options.

Substance records brief display

Linking options:
Each brief record links to the full substance record (click on the call number or structure), to references in which the compound was indexed, reactions in which the compound was indexed, and commercial source information. At present, SciFinder Web does not link to the CHEMLIST regulatory information for substances.

Full substance record

Main record for naproxen, part 1
Main record for naproxen, part 2
Main record for naproxen, part 3
Main record for naproxen, part 4
Main record for naproxen, part 5
Main record for naproxen, part 6
Main record for naproxen, part 7
Main record for naproxen, part 8

This record derives from the Chemical Abstracts Service Registry file. Note the information provided:

Other types of substances may display other types of information. Here's the record for the antibiotic vancomycin, which is a short peptide chain.

Main record for vancomycin
Main record for vancomycin. part 2
Main record for vancomycin, part 3
Main record for vancomycin, part 4
Main record for vancomycin, part 5
Main record for vancomycin, part 6
Main record for vancomycin, part 7

Note that the amino acid sequence is given using standard one-letter codes. A table gives the modificaitons and cross-linking of the sequence. Longer sequences, with more than 255 non-hydrogen atoms, do not have structure diagrams: only sequence strings are given.

Substances with non-stoichiometric formulas, like metal alloys and ceramic superconductors, may display tabular compositions:

Main record for Monel metal, part 1 Main record for Monel metal, part 2 Main record for Monel metal, part 3

Experimental and Predicted Property Values

Record for naproxen IR spectrum

Other types of substances can have different types of experimental property information appropriate to the substance. Polymers and plastics, such as PTFE (polytetrafluoroethylene, aka Teflon), and metal alloys, have values for properties like breakdown voltage, compressile and tensile strength, dielectric strength, electric conductivity and fracture toughness. Radioactive elements have half-lives, neutron capture cr0ss-sections, etc.

Commercial Source Information

SciFinder Scholar automatically links from Registry file records to the corresponding record in the CHEMCATS chemical catalogs database.
Brief records list each company, when the record was last updated, the name and order number that company uses for the compound, synonyms which it recognizes and the CAS Registry number.
Note that the records list is generally in order by the date that the data was updated, most recent first. Note, too, that the availability information may be exported in a tagged format, which most spreadsheet and database software can read, for easier manipulation, e.g. sorting by name, price, etc..

Companies selling the chemical naproxen

Clicking on the company name pulls up the full record for the substance in the particular company's catalog, including pricing, addresses, phone, fax, e-mail, etc. Note that thee is no guarantee that the price is current, at least it gives a ballpark estimate.

Naproxen from Cayman Chemical

Reaction Display

When you click the "Reactions" button for a substance, you get a screen where you may select the role or roles which the desired compound plays in the reaction.

Reaction roles selection

Clicking OK then retrieves the available reactions from the CASREACT database. Note that this covers organic reactions only, and is not necessarily comprehensive.

Reactions producing naproxen
Reactions producing naproxen, part 2

Note the reactant(s), products(s), catalyst and solvent information. Yields may be provided where available. Clicking on the "detail" link here displays the single-step reactions which make up the overall reaction scheme. There is a link back to the document record where the reaction is reported. If the source information is highlighted, you may click on it to see the document record (and from there jump to full text, or "Get Citing", etc.) Reactions will be covered in much greater detail in Lecture 14.

Unlike the SciFinder client, SciFinder Web does not currently hotlink the reactants, products, reagents, solvents and catalysts displayed in the reaction diagrams.

Refining Substance Results

Just as sets of references may be "refined", so sets of substances may be narrowed down further. [Note: There is no "Analyze" option for sets of substances retrieved by name, Registry Number or molecular formula.]

Refine substances options

Analyzing Substance Sets

Unlike SciFinder Scholar, SciFinder Web does allow you to analyze substance answer sets created by means other than structure searching. The possible analysis fields are: "Commercial Availability", "Elements" (that is, the elements present in the compounds in the answer set), "Reaction Availability" and "Substance Role" (see below.)

Substance answer set analysis table

"Get References" from a Substance Search

Once a set of substances has been located, you may retrieve references one compound at a time by clicking the "References" button associated with the compound record. Alternatively, you select a set of substances by clicking on the check box next to the desired compound(s), and clicking the "Get References" button at the bottom of the substance list screen. If you wish to retrieve references for all the substances in your starting set, it is unnecessary to check any of them.

Substance references selection screen

SciFinder Scholar allows you to select particular types of references for your set of substances. This feature makes use of selected subject terms associated with the Registry Number, including a special field in the record called Roles. In some cases, they also use keywords associated with a given subject (e.g., crystal structure). The image below is a portion of the indexing for a paper found by looking for "analytical studies" of naproxen.

Detail of indexing of an analytical study of naproxen

Notice the list of Roles applied to naproxen (and to the other drugs). A compound may have multiple roles in a given document. Not all of the roles assigned by CAS indexers are available as limiting terms for a "Get References" in SciFinder Web (as yet). Notice, too, how the structure diagram for the compound in our search query is displayed and the Registry Number is highlighted in gray.

In addition to the Roles selection, there is also a check box that applies only to substances which are biosequences (e.g. proteins or polynucleotides). If you check this box, SciFinder will retrieve not only references containing the Registry Number(s) you have selected, but also closely related biosequences. For example, if you "Get References" for the Registry Number for human insulin, 11061-68-0, it will retrieve only records containing that Registry Number. However, if you check the box, it will also retrieve records containing the Registry Number for generic insulin, for insulin-like growth factor and others besides.

Explore by Molecular Formula

Molecular Formula query screen

Searching SciFinder Scholar for substances by molecular formula is part of the Explore menu, and requires entering the total number of each element. (Only one formula can be searched at a time.) However, the system does not require the elements to be entered in strict Hill order. So long as each element is followed by the number of occurences of that element, the elements can be in any order, and with or without spaces between them (with certain exceptions; see below.)

Below is a portion of the results for searching on the molecular formula, C14H9Cl5, the molecular formula for the insecticide, DDT.

Results of molecular formula search for DDT

Notice how these results may include positional isomers (including ones with indefinite positions), stereoisomers and isotopically labelled substances.

Limitations of Identifier and Molecular Formula Searching in SciFinder Web

Unfortunately, not all substances are easy to locate in SciFinder Web. This reflects the origins of the software. The original commercial version of SciFinder was marketed heavily to companies engaged primarily in organic synthesis, especially the big pharmaceutical firms. As a result, search features which are effective for organic compounds (especially structure searching) are well developed, while other features which might be of greater use to biochemists, polymer chemists, inorganic and organometallic chemists have not yet been implemented.

For example, searching by chemical name will search only for the exact name as entered. This works well for substances with common or trade names, but can be difficult to impossible when trying to use complex systematic names. Moreover, it is impossible to search for families of names, such as in plastics, dyes and other commercial substances where a whole group of materials may vary only in the latter part of the name. Example: Nylon 6, Nylon 66; Lexan 100, Lexan 110.

Many biological substances come in a wide variety of forms, but searching on them by name will only yield the most generic form. Example: Insulin appears in the Registry File in hundreds of different specific forms, depending on species of origin, etc. But a search on "insulin" will yield only the generic Registry Number. Using the "Additional related references" checkbox will find some additional material, but results are non always consistent from one substance to another.

Even for simple organic compounds, searching by name will give only the basic form of the molecule -- and leave out stereoisomers, isotopically substituted forms and the like.

Molecular formula searching has its problems as well. Remember that salts are handled in an odd fashion in print CA? Well, those idiosyncrasies carry over to SciFinder. If you are searching for sodium sulfate (commonly written as Na2SO4, entering the molecular formula as Na2SO4 or Na2O4S won't work. You'll get only a strange result for a compound with no references. Neither will H2O4S.Na2. Only H2O4S.2Na will work. The necessary "smarts" to reinterpret other forms of the formula has not yet been built in. If you do enter the formula correctly, you will get results such as:

Substance records for sodium sulfate, part 1
Substance records for sodium sulfate, part 2

There are some partial workarounds that will help, at least for simple organic and inorganic substances. If you know a simple name for a substance, search that first. Then get the molecular formula from that record, and search it (using the form CAS uses, of course) to pick up stereoisomers, isotopically labelled substances, etc. If the name you pick doesn't work, try searching the name in a "Research Topic" search. If you can find a reference that way, look at the indexing and you may be able to find a Registry Number that you can use as the start of an "Identifier" search. Alternatively, analyze the resulting records by Registry Number. With any luck the item you are looking for will be among the top few listed Registry Numbers, and you can verify the correct one by clicking on the link to the Registry Record. In the following example, I searched for human insulin as a Research Topic. Analyzing by Registry Number, I found the number for human insulin as the fourth in the overall list. (Note: you can find human insulin using "human insulin" in a search by Substance Identifier..this is just an example of an approach.)

Finding the Registry Number for human insulin by Analyzing a search

The first Registry Number on the list is for generic insulin. The second, however, proved to be the Registry Number for human insulin.

Substance record for human insulin, part 1 Substance record for human insulin, part 2 Substance record for human insulin, part 3

In many cases, you will have to resort to structure searching, the most powerful tool in SciFinder. However, even structure searching will not help for polymers, biopolymers (proteins, nucleic acids, etc.), alloys, nonstoichiometric inorganic substances (e.g. the ceramic superconductors) and many other substances. However, there is hope. The features needed to search for most of these substances exist in the Messenger command language on which SciFinder rests. Eventually, ways to tap into them in SciFinder will be implemented, but for now, some searches require resorting to the STN version of the Registry File. This will be discussed further in a later lecture.

This page created by Chuck Huber (huber@library.ucsb.edu).
Updated: 02/18/08 11:09:20